WEBSPHERE ONLINE TRAINING - PRADHIKA TECHNOLOGY: October 2011

Tuesday, 11 October 2011

WebSphere Process Server operational architecture: Part 1: Base architecture and infrastructure components

www.pradhikatechnology.com

Rakesh Kumar:

Contact No: 0091-9700330693.
001- 3237843863.

WebSphere Process Server operational architecture:

Part 1: Base architecture and infrastructure components:

Introduction

WebSphere Process Server consists of many components that are closely connected and makes use of many different concepts like Service Component Architecture (SCA), Business Process Choreographer (BPC) and Service Integration Bus (SIB). Although these concepts are well documented, it is very difficult to get an overview of how these concepts work together within IBM WebSphere Process Server. Since this knowledge is crucial for the successful operation of business process applications running on WebSphere Process Server, this article provides a detailed and insightful view of the technical coherences of the concepts mentioned above. It describes which resources (for example SIB Destinations) are involved when running SCA modules and how they relate to each other. Moreover, the article covers the technical background of BPC (including execution of BPEL processes) and therein emphasizes the relations to the underlying concepts like SCA. Finally, it highlights the error conditions that can occur and shows how they can be identified (Failed Event Manager, Exceptions Queues, and so forth) from an operational point of view.

This article gives IT architects and WebSphere administrators a great opportunity to understand the operational background of WebSphere Process Server and the relationships between these concepts and technologies to help better manage your SOA infrastructure.

The final article in this series, Operational view on WebSphere Process Server V6.1, Part 2: SCA Runtime, Business Process Choreographer and Supporting Services, further explores the operational architecture of IBM WebSphere Process Server. You'll learn which components build WebSphere Process Server's runtime layer and how they work together in an operational environment from a technical point of view. In this respect, you'll understand how SCA modules look at runtime and how Business Process Choreographer manages your business processes. You'll gain a holistic view of Process Server's operational architecture and moreover you'll get a better understanding of how to establish a successful WebSphere Process Server operation in your organization.

WebSphere Process Server reference model

Nowadays every (IT) person, whether you take an IT Specialist or a more business-focused consultant, has a personal definition of what SOA is or is not. This begins with a thousand and one explanations of the term “service” and ends with millions of possible implementation techniques. This article does not attempt to define SOA but instead tries to establish a common high-level understanding of what SOA means and the role WebSphere Process Server plays in a service-oriented world. Finally, you will be introduced to WebSphere Process Server's building blocks and technical details that will help you understand WebSphere Process Server architecture from an operations perspective.

The role of WebSphere Process Server in a Service Oriented Architecture

Let us begin with the term SOA. When taking into account that you cannot buy SOA and SOA is not even a product, we can define it as a concept. Conceptually, SOA structures IT infrastructure in a special way where the focus lays on component-orientation, composition, reuse, and flexibility. Moreover, the concept aligns IT infrastructure to business needs, whereas the business itself is the driver for all initiatives. However, this actually means that SOA defines an architectural style, nothing more, nothing less.

From a conceptual point of view, several crucial requirements for the technical implementation of an SOA must exist:

A common approach, or common language for defining components and interfaces
Contracts between different components through a common interface definition
A platform to enable components to work together in a corporate, or even an inter-corporate, environment
A method to orchestrate components and to foster composition

In addressing these requirements, the Open Service Oriented Architecture (OSOA) initiative has defined a new industry standard called Service Component Architecture (hereafter named SCA), see Service Component Architecture (SCA) specifications in the Resources section for more information. The core idea behind SCA is to establish a common component model for the definition of business services. In this respect, the term service means an abstract service that is provided by a generic service provider; for example, a “bank inquiry” or “loan approval”. From a more technical point of view, SCA defines the frame around business components (via a common interface definition) and enables them to communicate with each other over corporate boundaries. Due to the broad acceptance of SCA by a large number of software manufacturers, it can be considered as the de facto specification for implementing SOA as an architectural style, see larger image Figure 1. Classification and demarcation of WebSphere Process Server.

Figure1. Operational reference architecture of WebSphere Process Server

As already discussed, SCA defines how components and interfaces can be defined and how you can call them. However, the OSOA initiative does not provide any actual implementation that you can use as an execution platform.

At this point, WebSphere Process Server comes into play, since it is IBM's implementation of the SCA (and related) specifications. With its role as an execution platform for SCA applications, WebSphere Process Server is one of the main building blocks in IBM's SOA portfolio. Furthermore, it is the runtime platform for business processes (for example, BPEL processes, Human Tasks, and so forth) developed with WebSphere Integration Developer, and therefore, plays a very important role in the SOA infrastructure of a company.

WebSphere Process Server reference architecture

As mentioned earlier, WebSphere Process Server is IBM's execution platform for so called SCA modules (applications that implement the SCA specification). Due to its central role in a service-oriented environment, it is really crucial to understand the server's main building blocks and how they relate to each other. In this respect, the following illustration introduces an operational reference architecture, which will be used as a starting point. Moreover, it will be used to explain the technical coherencies between the components throughout the entire article, see Figure 2.

Figure 2. Operational reference architecture of WebSphere Process Server

The WebSphere Process Server architecture is mainly defined by three layers that build the very basis for business applications:

Infrastructure layer
Runtime layer
Function layer

All of these layers depend on each other from the bottom up.

Infrastructure layer

The components in the Infrastructure layer provide the very basic functionality for all the above layers. One core function is the persistence of business data (from business applications or Business Process Choreographer for example) and runtime data produced by messaging facilities and other components. Since the validity and integrity of this data is really crucial for WebSphere Process Server's role, an enterprise scale database system (see WebSphere Process Server: Detailed list of system requirements in the Resources section for more information) for a detailed list of options) is used for this purpose.

Furthermore, the Infrastructure layer provides a variety of messaging resources that are based on WebSphere's Service Integration Bus technology (also referred to as WebSphere Platform Messaging). On the one hand, these are heavily used by the SCA runtime for asynchronous communication and buffering. On the other hand, the resources are also leveraged by almost all components in the Function layer.

Runtime layer

This layer provides the basic service component functionality from a SCA point of view. It incorporates IBM's implementation of the SCA standard, which is called SCA runtime hereafter. This part of the layer provides the ability to run SCA modules and moreover enables the contained components to communicate with each other. An very important aspect concerning the communication is that the technical foundation (and also the concrete implementation of the functionality) is completely hidden by the SCA artifacts (so a developer just defines “component A makes a synchronous call to component B”). Actually the SCA runtime is responsible for choosing the right communication channel and for doing the actual work. In this respect WebSphere Process Server heavily uses the underlying messaging capability, located in the Infrastructure layer.

Besides the additional functionality provided by the SCA runtime, WebSphere Process Server also includes core WebSphere Application Server capabilities. The application server enables the use of technologies like JMS (including MQ JMS connectivity), Web Services, J2EE Container services, JDBC database access and so forth. These are finally used by the SCA runtime to provide different kinds of binding types (consider a “binding” as the technical realization of SCA-like interface to the outside world; also see (Service Component Architecture (SCA) specifications in the Resources section for more information) for a list of generally possible types), such as Web Services Binding oder EJB Binding, to developers.

In summary, the Runtime layer builds the core of WebSphere Process Server's function as an execution platform for applications which are implemented according to the SCA assembly model.

Function layer

As described in the previous sections, the Infrastructure and the Runtime layer provide core services for running SCA applications. In addition to the actual implementation of the SCA assembly model, WebSphere Process Server features several SCA component implementation types (such as BPEL or Java™), and also many complementary services, that make the integration developer's life much more easier.

One of the major components in this layer is the Business Process Choreographer that can be considered the execution container for business processes (using the SCA BPEL Implementation Model; also see Service Component Architecture (SCA) specifications in the Resources section for more information) and human tasks (see WS-BPEL Extension for People in the Resources section). Since SOA is a rather business-centric approach, many SCA modules tend to make use of the so called Business Process Execution Language (BPEL) to a certain extent and will therefore get in touch with Business Process Choreographer from an operational point of view.

Besides the BPEL component, WebSphere Process Server supports a number of additional components and services that are not directly part of the SCA standard, but however may be leveraged by integration developers. One of the more common ones are Selectors and Business Rules (also see the Resources section for more information, Make composite business services adaptable with points of variability, Part 3: Using selectors and business rules for dynamicity). These components have a certain influence on the overall operational perspective on a WebSphere Process Server environment, since they use the underlying messaging and persistence capabilities, and because they can be relied on by modules in the Business Application layer.

Finally, the Common Event Infrastructure (hereafter called CEI) is located in the Function layer. CEI can be considered as a framework for logging business events, auditing and/or monitoring the execution of SCA components (and even business processes). The framework tightly integrates into IBM's SCA implementation, so that CEI can be used to generate a large variety of events that can furthermore be distributed to a variety of targets. For instance, CEI events may be used by a BPEL developer to report execution statistics of a SCA components to a monitoring console (consider IBM Tivoli Enterprise Portal). However, CEI is not directly related to the SCA standards itself, but it is highly inter operable with other IBM products, such as IBM WebSphere Business Monitor (see Business Activity Monitoring with WebSphere Business Monitor V6.1in the Resources section for more information), to enable Business Activity Monitoring (BAM) solutions.

Business Application layer

The actual SCA modules are closely connected to WebSphere Process Server's Function layer, since they make heavy use of all the capabilities, such as BPEL, human tasks, CEI events and so on, from a operational point of view (this complexity is actually hidden from the developer). All other layers are more or less hidden from the applications.

In summary, WebSphere Process Server's operational reference architecture can roughly be structured in three different layers, whereas the lowest provides basic infrastructure capabilities, such as persistence and messaging. The Runtime layer contains the actual runtime implementation of the SCA assembly model and enables WebSphere Process Server to be an execution platform for SCA modules. Finally, the Function layer exposes several SCA component implementations (such as BPEL, Java, Human Tasks) to be leveraged by business application developers.

Operational view on WebSphere Process Server components

In this section, you are introduced to the general concepts behind the Service Integration Bus (SIB) technology that is used for asynchronous communication within WebSphere Process Server. First,ly, the SIB related technical terms are clarified and set in relation to each other by joining them into a layer model. The second part of this section gives an overview over the different SI-Buses that are used in an operational WebSphere Process Server environment and how interaction with them takes place.

Service Integration Bus overview

SI-Buses were introduced in WebSphere Application Server V6 and replaced WebSphere MQ Series as the former internal platform messaging implementation. The implementation is completely Java based and all objects exist within the WebSphere Application Server runtime.

The following picture provides an overview of the different components that belong to WebSphere Platform Messaging technology. More precisely speaking, it shows four layers that you should interpret top-down (for a better understanding).

Figure 3. Layered SIB architecture

Application layer

Java EE applications that make use of asynchronous messaging standards and concepts reside on the Application layer. For example, a Message Driven Bean (MDB) utilizes the Java Messaging Service (JMS) standard to process messages asynchronously.

JMS layer

The JMS layer encapsulates the concrete implementation of a JMS provider by introducing the concept of so called JMS artifacts. A JMS artifact represents a logical object that points to a physical resource. One of these JMS artifacts are so called JMS Destinations. JMS knows two of them: namely JMS Queues and JMS Topics. The former are used for point-to-point messaging and the latter in publish-subscribe scenarios. Other JMS artifacts are JMS Connection Factories that abstract from the connection handling for JMS Destinations, for example, pooling concepts are hidden from the JMS API user. Also, a JMS Activation Specification belongs to the JMS artifacts. Such an Activation Specification is based on the Java Connector Architecture (JCA) that provides a standardized way to communicate with enterprise information systems using Java. JMS Activation Specifications represent a group of messaging configuration properties that could differ for each JMS provider.

Logical SIB layer

The logical SIB layer links the artifacts introduced by the JMS standard to WebSphere-specific resources. Although acting as a JMS provider is only one of many functionalities of the WebSphere Default Messaging, some core concepts of an SIB can be described best as having JMS in mind.

A bus can be seen as an architectural concept that is used by message-based applications and is defined by a set of servers or clusters, that are characterized as Bus Members in general. For the connection between the JMS layer and the SIB a Java 2 Connector Architecture (JCA) compliant Resource Adapter is used. A Resource Adapter implements a standardized way to connect a specific system (in this case the SIB) to a Java-based environment. The only requirement is that the Java environment acts as an JCA provider. In fact, each Java EE compliant application server like WebSphere Application Server does and therefore using a Resource Adapter fits well . Some of the advantages that come out of the box by using a Resource Adapter are transactional integrity and fine grained Quality-Of-Service concepts. The SIB JMS Resource Adapter will be described in more detail below.

On the logical SIB layer, there additionally exist components named Queues and Topics. It is important to keep in mind that these terms mean something different than JMS Queues and JMS topics. Nevertheless, a JMS Queue and an (SIB) Queue feature a one-to-one relationship; they differ with respect to their placement and their technical realization. In the same manner, Destinations are different from JMS Destinations and are only umbrella terms for SIB Queues and Topics. In the model above, it could be seen that a Destination for example, a Queue is uniquely assigned to a Bus Member.

Physical SIB layer

So far, only different abstraction layers have been described in this article. Nothing has yet been said about the physical model of SI-Buses . Obviously, each Destination has to have at least one persistent store in order to save its contents and its context data. Furthermore, a technical component is needed that is used to save this data and get it from that persistent store. The component realizing this requirement in an SIB is called a Messaging Engine (ME). Each ME is related either to a database or a file store to persist its data.

The physical objects representing SIB Destinations are called Message Points. For each kind of Destination, a different term is used for the corresponding Message Point: a Destination of the type "Queue" is represented by Queue Points and a Destination of the type "Topic" is represented by "Publication Points". Each Message Point belongs to exactly one Destination but several Message Points could exist for one Destination (in case of cluster bus members). A Message Point is always assigned uniquely to one ME.

View on the operational Messaging Engine architecture

When talking about WebSphere Process Server reference topologies (as described in Production Topologies for WebSphere Process Server and WebSphere ESB V6 in the Resources section), there will most likely be only one ME running per bus member and bus, which actually is considered as the default behavior. In the case of simpler WebSphere Process Server topologies, like non-clustered ones, this is the only possible scenario. Considering more complex topologies, like the so called “Remote Messaging and Remote Support” topology (speaking of WebSphere Process Server V6.0, this was referred to as Full Support or Golden topology), clusters are used to enable the high availability of the WebSphere Process Server messaging infrastructure.

In this respect, you can use different configurations that can be described best with a cluster consisting of two cluster members assigned to a SIB. The default behavior is that only one ME is active on one cluster member and will only become active on the other cluster member if the first one fails (see Figure 3 above). As stated earlier, this scenario enables messaging high availability and will usually be seen in enterprise scale WebSphere Process Server environments.

Figure 4. Different Messaging Engine configurations

When thinking of performance and scalability, another setup is possible where each cluster member hosts its own active ME and each ME owns a Message Point for a Destination assigned to the Bus-Member. Therefore messages on the Destination can be processed by both cluster members. Nevertheless, such a setup has some drawbacks:

Message order cannot be guaranteed, because each message may be processed by any one of the cluster members.
With only two cluster members fail over is not possible, because each ME has its own datastore assigned and only one ME per cluster member can be active for one SIB. As a result, data of Message Points cannot be transferred from one ME to another.

The concept of using more than one active ME per Bus Member is referred to as Partitioned Destinations (see figure above). However, the usage of Partitioned Destinations is currently not recommended in terms of WebSphere Process Server reference topologies.

Access to the messaging infrastructure

As mentioned above, a JCA Resource Adapter is used to link JMS resources and the SIB. This "SIB JMS Resource Adapter" manages the JMS artifacts and especially the JMS Activation Specifications where properties like the bus, destination name and concurrency information for the processing are stored. It is important to realize that the Activation Specification settings are a focal point for adjusting performance because they influence how many messages may concurrently processed (in the context of their functional role, they are comparable with Thread Pools).

The SCA runtime is not using the SIB JMS Resource Adapter but the "Platform Messaging Component SPI Resource Adapter" where SPI stands for Service Provider Interface. This Resource Adapter hosts the Activation Specifications of SCA Modules that are also important with respect to overall performance and processing in asynchronous communication paths. During the deployment of SCA Modules, no Connection Factories are created explicitly on the SIB SPI Resource Adapter, but instead the SCA runtime creates Connection Factories programmatically using the low level SIB Core API. Therefore the properties of these Connection Factories, such as the Connection Pool settings, are not visible and can't be changed by an administrator for tuning purposes. For the SIB JMS Resource Adapter, this is different because Connection Factories are created explicitly, and therefore all settings can be seen in the WebSphere Admin Console. The following figure visualizes the differences between both Resource Adapters.

Figure 5. WebSphere Process Server Communication through SIB Resource Adapters

Operational bus infrastructure

The following figure shows the four SI-Buses used within WebSphere Process Server. This section gives only a short overview about the buses. The details of the destinations are discussed in the subsequent sections.

Figure 6. Process Server's operational messaging architecture

The BPC Bus is used by the Business Process Choreographer for executing long running processes. Messages on the BPC Bus are so called navigational messages, which hold status and processing information about the corresponding processes. For example, the transition from one activity to another may lead to a navigational message, if each activity participates in its own transaction. The transactional behavior can only be changed during development time and can neither be viewed nor be changed at runtime.
Setting the optimal transaction boundaries may be considered as one of the key performance aspects for process implementations for different reasons. First, transaction context switches are quite expensive operations in terms of computation time. Second, crossing transactional boundaries leads to a navigational message in the case of long-running processes, which is also more expensive than staying within the same transaction.
The SCA System Bus is used by the SCA runtime for asynchronous communication between SCA components and SCA modules. The messages flowing through the Bus contain business data (originated from requests between the different SCA artifacts) and are not used for navigational purposes as with messages on the BPC bus.
The SCA Application Bus is most commonly used for custom destinations of SCA Modules. For example, the JMS Destinations referred to by an JMS Export should refer to Destinations residing on this bus.
The CEI Bus is used for asynchronous event transmission. The messages on this bus are the events that are picked up by a special MDB to be transferred for further processing to the so called Event Server application.

Database

This section gives an overview over the database architecture of WebSphere Process Server. The following picture shows a more logical view of the different databases used by WebSphere Process Server.

Figure 7. Operational persistence architecture

By default, WebSphere Process Server V6.1 installs all database objects into one single database. However, it is not required to place all database objects of the different data silos into different databases, so a common practice is to place the silos in separately manageable data silos for performance and scalability reasons.

The WPRCSDB silo contains configuration and cell wide data of WebSphere Process Server. For example data about some SCA components (Business Rules, Selectors, Relationships) is stored here. Furthermore, the data silo contains the Failed Event data. Because Failed Event information is needed in some cases to restart failed SCA flows, the availability of this data is quite important. This database exists just once per WebSphere Process Server cell.
The MEDB silo is used by the Messaging Engines running in WebSphere Process Server messaging infrastructure (see the previous section for more details) to persist messaging data: transaction context, message payload and administrative information. Each (active) Messaging Engine needs its own Schema within this database.
The EVENT silo data silo contains the data of the Common Base Event Infrastructure. If the option to persist all event data is selected it will be placed here.
The BPEDB silo contains Business Process Choreographer data, amongst other things the template data of processes and human tasks, status information of long running processes, and so forth. This data silo can exist several times in a cell if more than one server or cluster is configured for running Business Processes and/or Human Tasks.

Each of the data silos mentioned corresponds to one or more so called Data Sources within the WebSphere Process Server cell. A Data Source provides the connection properties to the underlying database along with connection pooling mechanisms. It is quite important to understand that these settings affect how WebSphere Process Server is performing. A good example to illustrate this is the maximal number of connections to the BPEDB. This value limits the concurrency of long running business processes to that number, because each process needs to write data to the BPEDB data silo.

Conclusion

This article introduced an operational reference architecture for WebSphere Process Server that you can use as a basis for all operational efforts around your WebSphere Process Server enabled SOA infrastructure. In addition, you have learned that WebSphere Process Server makes heavy use of messaging and persistence features and that the proper operation of these plays a significant role on the road to a successful service-oriented environment.

In the final article of this series, Operational view on WebSphere Process Server V6.1, Part 2: SCA Runtime, Business Process Choreographer and Supporting Services, you'll learn about the components located in the Runtime layer and Function layer from an operational point of view.

Monday, 10 October 2011

IBM HTTP Server Performance Tuning

IBM HTTP Server Performance Tuning

Table of Contents

2. Determining maximum simultaneous connections

The first tuning decision you'll need to make is determining how many simultaneous connections your IBM HTTP Server installation will need to support. Many other tuning decisions are dependent on this value.

For some IBM HTTP Server deployments, the amount of load on the web server is directly related to the typical business day, and may show a load pattern such as the following:

Simultaneous

connections

2000 |

| **********

| **** ***

1500 | ***** **

| **** ***

| *** ***

| * **

1000 | * **

| * *

500 | * *

| * *

| ** *

| *** ***

1 |*** **

Time of +-------------------------------------------------------------

day 7am 8am 9am 10am 11am 12am 1pm 2pm 3pm 4pm 5pm

For other IBM HTTP Server deployments, providing applications which are used in many time zones, load on the server varies much less during the day.

The maximum number of simultaneous connections must be based on the busiest part of the day. This maximum number of simultaneous connections is only loosely related to the number of users accessing the site. At any given moment, a single user can require anywhere from zero to four independent TCP connections.

The typical way to determine the maximum number of simultaneous connections is to monitor mod_status reports during the day until typical behavior is understood, or to use mod_mpmstats (2.0.42.2 and later).

Monitoring with mod_status

Add these directives to httpd.conf, or uncomment the ones already there:

2. # This example is for IBM HTTP Server 2.0 and above

3. # Similar directives are in older default configuration files.

5. Loadmodule status_module modules/mod_status.so

6. <Location /server-status>

7. SetHandler server-status

8. Order deny,allow

9. Deny from all

10.Allow from .example.com <--- replace with "." + your domain name

11.</Location>

Request the /server-status page (http://www.example.com/server-status/) from the web server at busy times of the day and look for a line like the following:

13.192 requests currently being processed, 287 idle workers

The number of requests currently being processed is the number of simultaneous connections at this time. Taking this reading at different times of the day can be used to determine the maximum number of connections that must be handled.

Monitoring with mod_mpmstats (IBM HTTP Server 2.0.42.2 and later)

IHS 6.1 and earlier: Copy the version of mod_mpmstats.so for your operating system from the ihsdiag package to the IBM HTTP Server modules directory. (Example filename: ihsdiag-1.4.1/2.0/aix/mod_mpmstats.so)
Add these directives to the bottom of httpd.conf:

3. LoadModule mpmstats_module modules/mod_mpmstats.so

4. ReportInterval 90

For IBM HTTP Server 7.0 and later, mod_mpmstats is enabled automatically.

Check entries like this in the error log to determine how many simultaneous connections were in use at different times of the day:

6. [Thu Aug 19 14:01:00 2004] [notice] mpmstats: rdy 712 bsy 312 rd 121 wr 173 ka 0 log 0 dns 0 cls 18

7. [Thu Aug 19 14:02:30 2004] [notice] mpmstats: rdy 809 bsy 215 rd 131 wr 44 ka 0 log 0 dns 0 cls 40

8. [Thu Aug 19 14:04:01 2004] [notice] mpmstats: rdy 707 bsy 317 rd 193 wr 97 ka 0 log 0 dns 0 cls 27

9. [Thu Aug 19 14:05:32 2004] [notice] mpmstats: rdy 731 bsy 293 rd 196 wr 39 ka 0 log 0 dns 0 cls 58

Note that if the web server has not been configured to support enough simultaneous connections, one of the following messages will be logged to the web server error log and clients will experience delays accessing the server.

Windows

[warn] Server ran out of threads to serve requests. Consider raising the ThreadsPerChild setting

Linux and Unix

[error] server reached MaxClients setting, consider raising the MaxClients setting

Check the error log for a message like this to determine if the IBM HTTP Server configuration needs to be changed.

Once the maximum number of simultaneous connections has been determined, add 25% as a safety factor. The next section discusses how to use this number in the web server configuration file.

Note: Setting of the KeepAliveTimeout can affect the apparent number of simultaneous requests being processed by the server. Increasing KeepAliveTimeout effectively reduces the number of threads available to service new inbound requests, and will result in a higher maximum number of simultaneous connections which must be supported by the web server. Decreasing KeepAliveTimeout can drive extra load on the server handling unnecessary TCP connection setup overhead. A setting of 5 to 10 seconds is reasonable for serving requests over high speed, low latency networks.

Looking at output from mod_mpmstats (2.0.42.2 and later) can help see if KeepAliveTimeout is set correctly.

For example:

[Thu Aug 28 10:12:17 2008] [notice] mpmstats: rdy 0 bsy 600 rd 1 wr 70 ka 484 log 0 dns 0 cls 45

shows that all threads are busy (0 threads are ready "rdy", 600 are busy "bsy"). 484 of them are just waiting for another keepalive request ("ka"), and yet the server will be rejecting requests because it has no threads available to work on them. Lowering KeepAliveTimeout would cause those threads to close their connections sooner and become available for more work.

2.1. TCP connection states and thread/process requirements

The netstat command can be used to show the state of TCP connections between clients and IBM HTTP Server. For some of these connection states, a web server thread (or child process, with 1.3.x on Unix) is consumed. For other states, no web server thread is consumed. See the following table to determine if a TCP connection in a particular state requires a web server thread.

TCP state	meaning	is a web server thread utilized?
LISTEN	no connection	no
SYN_RCVD	not ready to be processed	no
ESTABLISHED	ready for web server to accept and process requests, or already processing requests	yes, as soon as the web server realizes that connection is established; but if there aren't enough configured web server threads (e.g., MaxClients is too small), the connection may stall until a thread becomes ready
FIN_WAIT1	web server has closed the socket	no
CLOSE_WAIT	client has closed the socket, web server hasn't yet noticed	yes
LAST_ACK	client closed socket then web server closed socket	no
FIN_WAIT2	web server closed the socket then client ACKed; the connection remains in this state until a FIN is received from the client or an OS-specific timeout occurs; see Connections in the FIN_WAIT_2 state and Apache for more information	A web server thread can be utilized for up to two seconds in this state if FIN is not received from the client, after which the web server gives up and the web server thread is no longer utilized.
TIME_WAIT	waiting for 2*MSL timeout before allowing quad to be reused	no
CLOSING	web server and client closed at the same time	no

2.2. Handling enough simultaneous connections with IBM HTTP Server on Windows

IBM HTTP Server on Windows has a Parent process and a single multi-threaded Child process.

On 64-bit Windows OS'es, each instance of is limited to approximately 2500 ThreadsPerChild. On 32-bit Windows, this number is closer to 5000. These numbers are not exact limits, because the real limits are the sum of the fixed startup cost of memory for each thread + the maximum runtime memory usage per thread, which varies based on configuration and workload. Raising ThreadsPerChild and approaching these limits risks child process crashes when runtime memory usage puts the process address space over the 2GB or 3GB barrier.

Relevant config directives on Windows:

ThreadsPerChild

The ThreadsPerChild directive places an upper limit on the number of simultaneous connections the server can handle. ThreadsPerChild should be set according to the expected load.

ThreadLimit (2.0 and above)

ThreadsPerChild has a built in upper limit. Use ThreadLimit to increase the upper limit of ThreadsPerChild. The value of ThreadLimit affects the size of the shared memory segment the server uses to perform inter-process communication between the parent and the single child process. Do not increase ThreadLimit beyond what is required for ThreadsPerChild.

Recommended settings:

Directive	Value
ThreadsPerChild	maximum number of simultaneous connections
ThreadLimit	same as ThreadsPerChild (2.0 and above)

2.3. Handling enough simultaneous connections with IBM HTTP Server 2.0 and above on Linux and Unix systems

On UNIX and Linux platforms, a running instance of IBM HTTP Server will consist of one single threaded Parent process which starts and maintains one or more multi-threaded Child processes. HTTP requests are received and processed by threads running in the Child processes. Each simultaneous request (TCP connection) consumes a thread. You need to use the appropriate configuration directives to control how many threads the server starts to handle requests and on UNIX and Linux, you can control how the threads are distributed amongst the Child processes.

Relevant config directives on UNIX platforms:

StartServers

The StartServers directive controls how many Child Processes are started when the web server initializes. The recommended value is 1. Do not set this higher than MaxSpareThreads divided by ThreadsPerChild. Otherwise, processes will be started at initialization and terminated immediately thereafter.

Every second, IHS checks if new child processes are needed, so generally tuning of StartServers will be moot as early as a minute after IHS has started.

ServerLimit

There is a built-in upper limit on the number of child processes. At runtime, the actual upper limit on the number of child processes is MaxClients divided by ThreadsPerChild.

This should only be changed when you have reason to change MaxClients or ThreadsPerChild, it does not directly dictate the number of child processes created at runtime.

It is possible to see more child processes than this if some of them are gracefully stopping. If there are many of them, it probably means that MaxSpareThreads is set too small, or that MaxRequestsPerChild is non-zero and not large enough; see below for more information on both these directives.

ThreadsPerChild

Use the ThreadsPerChild directive to control how many threads each Child process starts. More information on strategies for distributing threads amongst child processes is included below.

ThreadLimit

ThreadsPerChild has a built in upper limit. Use ThreadLimit to increase the upper limit of ThreadsPerChild. The value of ThreadLimit affects the size of the shared memory segment the server uses to perform inter-process communication between the parent and child processes. Do not increase ThreadLimit beyond what is required for ThreadsPerChild.

MaxClients

The MaxClients directive places an upper limit on the number of simultaneous connections the server can handle. MaxClients should be set according to the expected load.

The MaxSpareThreads and MinSpareThreads directives affect how the server reacts to changes in server load. You can use these directives to instruct the server to automatically increase the number of Child processes when server load increases (subject to limits imposed by ServerLimit and MaxClients) and to decrease the number of Child processes when server load is low. This feature can be a useful for managing overall system memory utilization when your server is being used for tasks other than serving HTTP requests.

Setting MaxSpareThreads to a relatively small value has a performance penalty: Extra CPU to terminate and create child processes. During normal operation, the load on the server may vary widely (e.g., from 150 busy threads to 450 busy threads). If MaxSpareThreads is smaller than this variance (e.g., 450-150=300), then the web server will terminate and create child processes frequently, resulting in reduced performance.

Recommended settings:

Directive	Value
ThreadsPerChild	Leave at the default value, or increase to a larger proportion of MaxClients for better coordination of WebSphere Plugin processing threads (via less child processes). Larger ThreadsPerChild (and fewer processes) also results in fewer dedicated web container threads being used by the ESI invalidation feature of the WebSphere Plugin
MaxClients	maximum number of simultaneous connections, rounded up to an even multiple of ThreadsPerChild
StartServers	2
MinSpareThreads	The greater of "25" or 10% of MaxClients integer. Since IHS checks this value approximately once per second, MinSpareThreads should safely exceed the number of new requests you might receive in a second. Setting MinSpareThreads too high with the WebSphere Plugin may trigger premature spawning of new child processes, for which AppServer requests will be distributed over, with no sharing of MaxConnections counts or markdowns. If the ESI invalidation servlet is configured in the WebSphere Plugin, each additional process results in a dedicated web container thread being consumed. Setting MinSpareThreads too low may induce delays of a few seconds if IHS runs out of processing threads.
MaxSpareThreads	There are multiple approaches to the tuning of this directive: Preallocation: The system must have enough resources to handle MaxClients anyway, so let the web server retain idle threads/processes so that they are immediately ready to serve requests when load increases again. Set MaxSpareThreads to the same value as MaxClients. This approach should be used if there are extremely long-running application requests that would keep child processes from being able to terminate gracefully. Reduce web server resource utilization during idle periods and increase the coordination between WebSphere Plugin threads: Allow the web server to clean up idle threads after load subsides so that the resources can be used for other applications. When the load increases again, it will reclaim the resources as it creates new child processes. Set MaxSpareThreads to 25-30% of MaxClients. If it is too small a fraction of MaxClients, child processes will be terminated and recreated frequently.
ServerLimit	MaxClients divided by ThreadsPerChild, or the default if that is high enough
ThreadLimit	ThreadsPerChild

Note: ThreadLimit and ServerLimit need to appear before these other directives in the configuration file.

Default settings in recent default configuration files:

ThreadLimit 25

ServerLimit 64

StartServers 2

MaxClients 600

MinSpareThreads 25

MaxSpareThreads 75

ThreadsPerChild 25

MaxRequestsPerChild 0

</IfModule>

2.3.1 If memory is constrained

If there is concern about available memory on the server, some additional tuning can be done.

Increasing ThreadsPerChild (and ThreadLimit) will reduce the number of total server processes needed, reducing the per-server memory overhead. However, there are a number of possible drawbacks to increasing ThreadsPerChild. Search this document for ThreadsPerChild and consider all the warnings before changing it.

Setting ThreadStackSize to e.g. 131072, and MaxMemFree to e.g. 512, will limit memory usage of each thread.

2.4. Handling enough simultaneous connections with IBM HTTP Server 1.3.x on Linux and Unix systems

IBM HTTP Server 1.3 on Linux and Unix systems uses one single-threaded child process per concurrent connection.

Recommended settings:

Directive	Value
MaxClients	maximum number of simultaneous connections
MinSpareServers	1
MaxSpareServers	same value as MaxClients
StartServers	default value

3. Out of the box tuning concerns

3.1. All platforms

MaxClients, ThreadsPerChild, etc.

Refer to the previous section.

cipher ordering (SSL only)

The default SSLCipherSpec ordering enables maximum strength SSL connections at a significant performance penalty. A much better performing and reasonably strong SSLCipherSpec configuration is given below.

Sendfile (non-SSL only)

With IBM HTTP Server 2.0 and above, Sendfile usage is disabled in the current default configuration files. This avoids some occasional platform-specific problems, but it may also increase CPU utilization on platforms on which sendfile is supported (Windows, AIX, Linux, HP-UX, and Solaris/x64).

If you enable sendfile usage on AIX, ensure that the nbc_limit setting displayed by the no program is not too high for your system. On many systems, the AIX system default is 768MB. We recommend setting this to a much more conservative value, such as 256MB. If the limit is too high, and the web server use of sendfile results in a large amount of network buffer cache memory utilization, a wide range of other system functions may fail. In situations like that, the best diagnostic step is to check network buffer cache utilization by running netstat -c. If it is relatively high (hundreds of megabytes), disable sendfile usage and see if the problem occurs again. Alternately, nbc_limit can be lowered significantly but sendfile still be enabled.

Some Apache users on Solaris have noted that sendfile is slower than the normal file handling, and that sendfile may not function properly on that platform with ZFS or some Ethernet drivers. IBM HTTP Server provides support for sendfile on Solaris/x64 but not Solaris/SPARC.

3.2. AIX

With IBM HTTP Server 2.0.42 and above, the default IHSROOT/bin/envvars file specifies the setting MALLOCMULTIHEAP=considersize,heaps:8. This enables a memory management scheme for the AIX heap library which is better for multithreaded applications, and configures it to try to minimize memory use and to use a moderate number of heaps. For configurations with extensive heap operations (SSL or certain third-party modules), CPU utilization can be lowered by changing this setting to the following: MALLOCMULTIHEAP=true. This may increase the memory usage slightly.

3.3. Windows

The Fast Response Cache Accelerator (FRCA, aka AFPA) is disabled in the current default configuration files because some common Windows extensions, such as Norton Antivirus, are not compatible with it. FRCA is a kernel resident micro HTTP server optimized for serving static, non-access protected files directly out of the file system. The use of FRCA can dramatically reduce CPU utilization in some configurations. FRCA cannot be used for serving content over HTTPS/SSL connections.

4. Configuration features to avoid

IBM HTTP Server supports some features and configuration directives that can have a severe impact on server performance. Use of these features should be avoided unless there are compelling reasons to enable them.

HostnameLookups On

Performance penalty: Extra DNS lookups per request.

This is disabled by default in the sample configuration files.

IdentityCheck On

Performance penalty: Delays introduced in the request to contact RFC 1413 ident daemon possibly running on client machine

This is disabled by default in the sample configuration files.

mod_mime_magic

Performance penalty: Extra CPU and disk I/O to try to find the file type

This is disabled by default in the sample configuration files.

ContentDigest On (1.3 only)

Performance penalty: Extra CPU to compute MD5 hash of the response

This is disabled by default in the sample configuration files.

setting MaxRequestsPerChild to non-zero

Performance penalty:

Extra CPU to terminate and create child processes
With IHS 2 or higher on Linux and Unix, this can lead to an excessive number of child processes, which in turn can lead to excessive swap space usage. Once a child process reaches MaxRequestsPerChild it will not handle any new connections, but existing connections are allowed to complete. In other words, only one long-running request in the process will keep the process active, sometimes indefinitely. In environments where long-running requests are not unusual, a large number of exiting child processes can build up.

This is set to the optimal setting (0) in default configuration files for recent releases.

In rare cases, IHS support will recommend setting MaxRequestsPerChild to non-zero to work around a growth in resources, based on an understanding of what type of resource is growing in use, and what other mechanisms are available to address that growth.

With IBM HTTP Server 1.3 on Linux and Unix, a setting of a high value such as 10000 is not a concern. The child processes each handle only a single connection, so they cannot be prevented from exiting by long-running requests.

With IBM HTTP Server 2.0 and above on Linux and Unix, if the feature must be used, then only set it to a relatively high value such as 50000 or more to limit the risk of building up a large number of child processes which are trying to exit but which can't because of a long-running request which has not completed.

.htaccess files

Performance penalty: Extra CPU and disk I/O to locate .htaccess files in directories where static files are served

.htaccess files are disabled in the sample configuration files.

detailed logging

Detailed logging (SSLTrace, plug-in LogLevel=trace, GSKit trace, third-party module logging) is often enabled as part of problem diagnosis. When one or more of these traces is left enabled after the problem is resolved, CPU utilization is higher than normal.

Detailed logging is disabled in the sample configuration files.

disabling Options FollowSymLinks

If the static files are maintained by untrusted users, you may want to disable this option in the configuration file, in order to prevent those untrusted users from creating symbolic links to private files that should not ordinarily be served. But disabling FollowSymLinks to prevent this problem will result in performance degradation since the web server then has to check every component of the pathname to determine if it is a symbolic link.

Following symbolic links is enabled in the sample configuration files.

5. Common configuration changes and their Implications

5.1. IBM HTTP Server 2.0 and above on Linux and Unix systems: ThreadsPerChild

This directive is commonly modified as part of tuning the web server. There are advantages and disadvantages for different values of ThreadsPerChild:

Higher values for ThreadsPerChild result in lower overall memory use for the server, as long as the value of ThreadsPerChild isn't higher than the normal number of concurrent TCP connections handled by the server.
Extremely high values for ThreadsPerChild may result in encountering address space limitations.
Higher values for ThreadsPerChild often results in lower numbers of connections which the WebSphere connection maintains to the application server and better sharing of markdown information.
Higher values for ThreadsPerChild result in higher CPU utilization for SSL processing.
On older Linux distributions such as RedHat Advanced Server 2.1 and SuSE SLES 8 which use the linuxthreads library, higher values for ThreadsPerChild result in higher CPU utilization in the threads library.

Some features may exacerbate this problem, such as RewriteMap or the following modules: mod_mem_cache, mod_ibm_ldap, or mod_ext_filter.

Higher ThreadsPerChild results in a more effective use of the cache and connection pooling in mod_ibm_ldap.
Higher ThreadsPerChild results in a more effective use of the cache in mod_mem_cache, because each child must fill its own cache.

MaxSpareThreads = MaxClients is also beneficial for mod_mem_cache because it prevents child processes who have built up large caches from being gracefully terminated.

System tuning changes may be necessary to run with higher values for ThreadsPerChild. If IBM HTTP Server fails to start after increasing ThreadsPerChild, check the error log for any error messages. A common failure is a failed attempt to create a worker thread.

apr_thread_create: unable to create worker thread error message and tuning hints

5.2. IBM HTTP Server 2.0 and above on Linux and Unix systems: MaxClients

This directive is commonly modified as part of tuning the web server to handle a greater client load (more concurrent TCP connections).

When MaxClients is increased, the value for MaxSpareThreads should be scaled up as well. Otherwise, extra CPU will be spent terminating and creating child processes when the load changes by a relatively small amount.

5.3. ExtendedStatus

This directive controls whether some important information is saved in the scoreboard for use by mod_status and diagnostic modules. When this is set to On, web server CPU usage may increase by as much as one percent. However, it can make mod_status reports and some other diagnostic tools more useful.

6. WebSphere plug-in concerns on Linux and Unix systems

6.1 Tuning IHS to make the MaxConnections parameter more effective

The use of the MaxConnections parameter in the WebSphere plug-in configuration is most effective when IBM HTTP Server 2.0 and above is used and there is a single IHS child process. However, there are other tradeoffs:

linuxthreads (traditional pthread library on Linux): ThreadsPerChild greater than about 100 results in high CPU overhead
SSL on any platform: threadsPerChild greater than about 100 results in high CPU overhead
WebSphere 5.x plug-in has a file descriptor limitation which will be encountered on Linux and Solaris if ThreadsPerChild is greater than 500

Using MaxConnections with more then 1 child processes introduces a number of complications. Each IHS child process must have a high enough MaxConnections value to allow each thread to be able to find a backend server, but in aggregate the child processes should not be able to overrun an individual application server.

Choosing a value for MaxConnections

MaxConnections has no effect if it exceeds ThreadsPerChild, because no child could try to use that many connections in the first place.
Upper limit

If you are concerned about a single HTTP Server overloading an Application server, you must first determine "N" -- the maximum number of requests the single AppServer can handle.

MaxConnections would then be = (N / (MaxClients / ThreadsPerChild)), or N divided by the maximum number of child processes based on your configuration . This represents the worst-case number of connections by IHS to a single Application Server. As the number of backends grows, the likelyhood of the worst-case scenario decreases as even the uncoordinated child processes are still distributing load with respect to session affinity and load balancing.

For example, if you wish to restrict each Application Server to a total of 200 connections, spread out among 4 child processes, you must set the MaxConnections parameter to 50 because each child process keeps its own count.

Lower Limit

If MaxConnections is too small, a child process may start returning errors because it has no AppServers to use.

To prevent problems, MaxConnections * (number of usable backend servers) should exceed ThreadsPerChild.

For example, if each child process has 128 ThreadsPerChild and MaxConnections is only 50 with two backend AppServers, a single child process may not be able to fulfill all 128 requests because only 50 * 2 connections can be made.

To use MaxConnections, IHS should be configured to use a small, fixed number of child process, and to not vary them in response to a change in load. This provides a consistent, predictable number of child processes that each have a fixed MaxConnections parameter.

MinSpareServers and MaxSpareServers should be set to the same value as MaxClients.
StartServers should be set to MaxClients / ThreadsPerChild.

When more then 1 child process is configured (number of child processes is MaxClients/ThreadsPerChild), setting MaxSpareServers equal to MaxClients can have the effect of keeping multiple child process alive when they aren't strictly needed. This can be considered detrimental to the WebSphere Plugin detecting markdowns, because the threads in each child process must discover a server should be marked down. See section 6.2 below.

6.2 Tuning IHS for efficiency of Plugin markdown handling

Only WebSphere Plugin threads in a single IHS child process share info about AppServer markdowns, so some customers wish to aggressively limit the number of child processes that are running at any given time. If a user has problems with markdowns being discovered by many different child processes in the same webserver, consider increasing ThreadsPerChild and reducing MinSpareThreads and MaxSpareThreads as detailed below.

One approach is to use a single child process, where MaxClients and ThreadsPerChild are set to the same value. IHS will never create or destroy child processes in response to load.

Cautions

A WebServer crash impacts 100% of the clients.
Some types of hangs may influence 100% of the clients.
CPU usage may increase if SSL is used and ThreadsPerChild exceeds a few hundred.
More ramifications of high ThreadsPerChild is discussed here.

A second approach is to use a variable number of child processes, but to aggressively limit the number created by IHS in response to demand (and aggressively remove unneeded processes). This is accomplished by setting ThreadsPerChild to 25% or 50% of MaxClients, setting MinSpareThreads and MaxSpareThreads low (relative to recommendations here).

Cautions:

MaxSpareThreads < MaxClients causes IHS to routinely kill off child processes, however it may take some time for these processes to exit while slow requests finish processing.
A lower MaxSpareThread can cause extra CPU usage for the creation of replacement child processes.
Caches for ESI and mod_mem_cache are thrown away when child processes exit.

See also

High CPU in child processes after WebSphere plugin config is updated

6.3 Tuning IHS for efficiency of ESI invalidation servlet / web container threads

As the number of child processes increases (ratio of ThreadsPerChild / MaxClients shrinks), if the ESI Invalidation Servlet is used with the WebSphere Plugin, more and more Web Container threads will be permanently consumed. Each child processes uses 1 ESI Invalidation thread (when the feature is configured), and this thread is used synchronously in the web container.

This requires careful consideration of the number of child processes per webserver, the number of webservers, and the number of configured Web Container threads.

7. SSL Performance

7.1. ciphers

When an SSL connection is established, the client (web browser) and the web server negotiate the cipher to use for the connection. The web server has an ordered list of ciphers, and the first cipher in that list which is supported by the client will be selected.

By default, IBM HTTP Server prefers AES and RC4 ciphers over the computationally expensive Triple-DES (3DES) cipher suite, and tuning of the order of SSL directives for performance reasons is generally not needed.

IBM HTTP Server supports the following SSL ciphers:

SSL V2:

shortname longname Meaning Strength

========= ======== ============= ========

27 SSL_DES_192_EDE3_CBC_WITH_MD5 Triple-DES (168 bit) (stronger)

21 SSL_RC4_128_WITH_MD5 RC4 (128 bit)

23 SSL_RC2_CBC_128_CBC_WITH_MD5 RC2 (128 bit) |

26 SSL_DES_64_CBC_WITH_MD5 DES (56 bit) V

22 SSL_RC4_128_EXPORT40_WITH_MD5 RC4 (40 bit)

24 SSL_RC2_CBC_128_CBC_EXPORT40_WITH_MD5 RC2 (40 bit) (weaker)

SSL V3 and TLSV1:

shortname longname Meaning Strength

========= ======== ============= ========

3A SSL_RSA_WITH_3DES_EDE_CBC_SHA Triple-DES SHA (168 bit) (stronger)

35b TLS_RSA_WITH_AES_256_CBC_SHA AES SHA (256 bit)

35 SSL_RSA_WITH_RC4_128_SHA RC4 SHA (128 bit)

34 SSL_RSA_WITH_RC4_128_MD5 RC4 MD5 (128 bit) |

2F TLS_RSA_WITH_AES_128_CBC_SHA AES SHA (128 bit)

39 SSL_RSA_WITH_DES_CBC_SHA DES SHA (56 bit) V

62 TLS_RSA_EXPORT1024_WITH_RC4_56_SHA RC4 SHA(56 Bit)

64 TLS_RSA_EXPORT1024_WITH_DES_CBC_SHA DES SHA(56 Bit)

33 SSL_RSA_EXPORT_WITH_RC4_40_MD5 RC4 MD5 (40 bit)

36 SSL_RSA_EXPORT_WITH_RC2_CBC_40_MD5 RC2 MD5 (40 bit) (weaker)

32 SSL_RSA_WITH_NULL_SHA

31 SSL_RSA_WITH_NULL_MD5

30 SSL_NULL_WITH_NULL_NULL

FIPS Approved NIST SSLV3 and TLSV1 (only available with SSLFIPSEnable):

shortname longname Meaning Strength

========= ======== ============= ========

3A SSL_RSA_WITH_3DES_EDE_CBC_SHA Triple-DES SHA (168 bit) (stronger)

35b TLS_RSA_WITH_AES_256_CBC_SHA AES SHA (256 bit)

2F TLS_RSA_WITH_AES_128_CBC_SHA AES SHA (128 bit) |

The following configuration directs the server to prefer strong 128-bit RC4 ciphers first and will provide a significant performance improvement over the default configuration. This configuration does not support the weaker 40-bit, 56-bit, or NULL/Plaintext ciphers that security scanners may complain about.

The order of the SSLCipherSpec directives dictates the priority of the ciphers, so we order them in a way that will cause IHS to prefer less CPU intensive ciphers. SSLv2 is disabled implicitly by not including any SSLv2 ciphers

SSLEnable

Keyfile keyfile.kdb

## SSLv3 128 bit Ciphers

SSLCipherSpec SSL_RSA_WITH_RC4_128_MD5

SSLCipherSpec SSL_RSA_WITH_RC4_128_SHA

## FIPS approved SSLV3 and TLSv1 128 bit AES Cipher

SSLCipherSpec TLS_RSA_WITH_AES_128_CBC_SHA

## FIPS approved SSLV3 and TLSv1 256 bit AES Cipher

SSLCipherSpec TLS_RSA_WITH_AES_256_CBC_SHA

## Triple DES 168 bit Ciphers

## These can still be used, but only if the client does

## not support any of the ciphers listed above.

SSLCipherSpec SSL_RSA_WITH_3DES_EDE_CBC_SHA

## The following block enables SSLv2. Excluding it in the presence of

## the SSLv3 configuration above disables SSLv2 support.

## Uncomment to enable SSLv2 (with 128 bit Ciphers)

#SSLCipherSpec SSL_RC4_128_WITH_MD5

#SSLCipherSpec SSL_RC4_128_WITH_SHA

#SSLCipherSpec SSL_DES_192_EDE3_CBC_WITH_MD5

</VirtualHost>

You can use the following LogFormat directive to view and log the SSL cipher negotiated for each connection:

LogFormat "%h %l %u %t \"%r\" %>s %b \"SSL=%{HTTPS}e\" \"%{HTTPS_CIPHER}e\" \"%{HTTPS_KEYSIZE}e\" \"%{HTTPS_SECRETKEYSIZE}e\"" ssl_common

CustomLog logs/ssl_cipher.log ssl_common

This logformat will produce an output to the ssl_cipher.log that looks something like this:

127.0.0.1 - - [18/Feb/2005:10:02:05 -0500] "GET / HTTP/1.1" 200 1582 "SSL=ON" "SSL_RSA_WITH_RC4_128_MD5" "128" "128"

7.2. Server certificate size

Larger server certificates are also costly. Every doubling of key size costs 4-8 times more CPU for the required computation.

Unfortunately, you don't have a lot of choice in the size of your server certificate; the industry is currently (2010) moving from 1024-bit to 2048-bit certificates to keep up with the increasing compute power available to those trying to break SSL. But there are some SSL performance tuning tips that can help.

The primary cost of the computation associated with a larger server certificate size is in the SSL handshake when a new session is created, so using keep-alive and re-using SSL sessions can make a significant difference in performance. See more about that below.

7.3. Linux and Unix systems, IBM HTTP Server 2.0 and higher: ThreadsPerChild

The SSL CPU utilization will be lower with lower values of ThreadsPerChild. We recommend using a maximum of 100 if your server handles a lot of SSL traffic, so that the client load is spread among multiple child processes. (Note: This optimization is not possible on Windows, which supports only a single child process.)

7.4. AIX, IBM HTTP Server 2.0 and higher: MALLOCMULTIHEAP setting in IHSROOT/bin/envvars

Set this to the value true when there is significant SSL work-load, as this will result in better performance for the heap operations used by SSL processing.

7.5. Should I use a cryptographic accelerator?

The preferred approach to improving SSL performance is to use software tuning to the greatest extent possible. Installation and maintenance of crypto cards is relatively complex and usually results in a relatively small reduction in CPU usage. We have observed many situations where the improvement is less than 10%.

7.6. HTTP keep-alive and SSL

HTTP keep-alive has a much larger benefit for SSL than for non-SSL. If the goal is to limit the number of worker threads utilized for keep-alive handling, performance will be much better if KeepAlive is enabled with a small timeout for SSL-enabled virtual hosts, than if keep-alive is disabled altogether.

Example:

normal configuration

# enable keepalive support, but with very small timeout

# to minimize the use of worker threads

KeepAlive On

KeepAliveTimeout 1

</VirtualHost>

Warning! We are not recommending "KeepAliveTimeout 1" in general. We are suggesting that this is much better than setting KeepAlive Off. Larger values for KeepAliveTimeout will result in slightly better SSL session utilization at the expense of tying up a worker thread for a longer period of time in case the browser sends in another request before the timeout is over. There are diminishing returns for larger values, and the optimal values are dependent upon the interaction between your application and client browsers.

7.7. SSL Sessions and Load Balancers

An SSL session is a logical connection between the client and web server for secure communications. During the establishment of the SSL session, public key cryptography is used to to exchange a shared secret master key between the client and the server, and other characteristics of the communication, such as the cipher, are determined. Later data transfer over the session is encrypted and decrypted with symmetric key cryptography, using the shared key created during the SSL handshake.

The generation of the shared key is very CPU intensive. In order to avoid generating the shared key for every TCP connection, there is a capability to reuse the same SSL session for multiple connections. The client must request to reuse the same SSL session in the subsequent handshake, and the server must have the SSL session identifier cached. When these requirements are met, the handshake for the subsequent TCP connection requires far less server CPU (80% less in some tests). All web browsers in general use are able to reuse the same SSL session. Custom web clients sometimes do not have the necessary support, however.

The use of load balancers between web clients and web servers presents a special problem. IBM HTTP Server cannot share a session id cache across machines. Thus, the SSL session can be reused only if a subsequent TCP connection from the same client is sent by the load balancer to the same web server. If it goes to another web server, the session cannot be reused and the shared key must be regenerated, at great CPU expense.

Because of the importance of reusing the same SSL session, load balancer products generally provide the capability of establishing affinity between a particular web client and a particular web server, as long as the web client tries to reuse an existing SSL session. Without the affinity, subsequent connections from a client will often be handled by a different web server, which will require that a new shared key be generated because a new SSL session will be required.

Some load balancer products refer to this feature as SSL Sticky or Session Affinity. Other products may use their own terminology. It is important to activate the appropriate feature to avoid unnecessary CPU usage in the web server, by increasing the frequency that SSL sessions can be reused on subsequent TCP connections.

End users will generally not be aware that SSL session is not being reused unless the overhead of continually negotiating new sessions causes excessive delay in responses. Web server administrators will generally only become aware of this situation when they observe the CPU utilization approaching 100%. The point at which this becomes noticeable will depend on the performance of the web server hardware, and whether or not a cryptographic accelerator is being used.

When SSL is being used and excessive web server CPU utilization is noticed, it is important to first confirm that Session Affinity is enabled if a load balancer is being used.

Checking the actual reuse of SSL sessions

First, get the number of new sessions and reused sessions. LogLevel must be set to info or debug.

IBM HTTP Server 2.0.42 or 2.0.47 up through cumulative fix PK07831, and IBM HTTP Server 6 up through 6.0.2 writes messages of this format for each handshake:

[Sat Jul 09 10:37:22 2005] [info] New Session ID: 0

[Sat Jul 09 10:37:22 2005] [info] New Session ID: 1

0 means that an existing SSL session was re-used. 1 means that a new SSL session was created.

Getting the number of each type of handshake:

$ grep "New Session ID: 0" logs/error_log | wc -l

1115

$ grep "New Session ID: 1" logs/error_log | wc -l

163

IBM HTTP Server 2.0.42 or 2.0.47 with cumulative fix PK13230 or later and IBM HTTP Server 6.0.2.1 and later writes messages of this format for each handshake:

[Sat Oct 01 15:30:17 2005] [info] [client 9.49.202.236] Session ID: YT8AAPUJ4gWir+U4v2mZFaw5KDlYWFhYyOM+QwAAAAA= (new)

[Sat Oct 01 15:30:32 2005] [info] [client 9.49.202.236] Session ID: YT8AAPUJ4gWir+U4v2mZFaw5KDlYWFhYyOM+QwAAAAA= (reused)

To get the relative stats:

$ grep "Session ID.*reused" logs/error_log | wc -l

1115

$ grep "Session ID:.*new" logs/error_log | wc -l

163

The percentage of expensive handshakes for this test run is 163 / (1115 + 163), or 12.8%. To confirm that the load balancer is not impeding the reuse of SSL sessions, perform a load test with and without the load balancer*, and compare the percentage of expensive handshakes in both tests.

*Alternately, use the load balancer for both tests, but for one load test have the load balancer to send all connections to a particular web server, and for the other load test have it load balance between multiple web servers.

7.8. Session ID cache limits

IBM HTTP Server uses an external session ID cache with no practical limits on the number of session IDs unless the operating system is Windows or the directive SSLCacheDisable is present in the IHS configuration.

When the operating system is Windows or the SSLCacheDisable directive is present, IBM HTTP Server uses the GSKit internal session ID cache which is limited to 512 entries by default.

This limit can be increased to a maximum of 4095 (64000 for z/OS) entries by setting the environment variable GSK_V3_SIDCACHE_SIZE to the desired value.

8. Network Tuning

8.1 All platforms

Problem Description

Low data transfer rates handling large POST requests.

This problem can be caused by a small TCP receive buffer size being used for web server sockets. This results in the client being limited in how much data it can send before the server machine has to acknowledge it, resulting in poor network utilization.

Resolution

Some data transfer performance problems can be solved using the native operating system mechanism for increasing the default size of TCP receive buffers. IBM HTTP Server must be restarted after making the change.

Platform	Tuning parameter	Instructions
AIX	tcp_recvspace	Run no -o tcp_recvspace to display the old value. Run no -o tcp_recvspace=new_value to set a larger value.
Solaris	tcp_recv_hiwat	Run ndd /dev/tcp tcp_recv_hiwat to display the old value. Run ndd -set /dev/tcp tcp_recv_hiwat new_value to set a larger value.
HP-UX	tcp_recv_hiwater_def	Run ndd /dev/tcp tcp_recv_hiwater_def to display the old value. Run ndd -set /dev/tcp tcp_recv_hiwater_def new_value to set a larger value.
Linux	rmem_default	Run cat /proc/sys/net/core/rmem_default to display the old value. Run echo new_value > /proc/sys/net/core/rmem_default to set a larger value.

The following levels of IBM HTTP Server contain a ReceiveBufferSize directive for setting this value in a platform-independent manner, and only for the web server:

2.0.42.2 with cumulative e-fix PK07831 or later
2.0.47.1 with cumulative e-fix PK07831 or later
6.0.2 or later
(6.0.2.1 or later on Windows)

Usage:

ReceiveBufferSize number-of-bytes

This directive must appear at global scope in the configuration file.

Making the adjustment

Check the current system default using the platform-specific command in the previous table.
Use either 131072 bytes, or twice the current system default, whichever is greater.
Example ReceiveBufferSize directive:
ReceiveBufferSize 131072
If the ReceiveBufferSize directive is not available, use the platform-specific command in the previous table to change the system default.
Restart the web server, then retry the testcase.
If POST performance did not improve enough, double the receive buffer value and try again.

8.2 AIX

Problem Description

Low data transfer rates running on AIX 5 when handling large (multi-megabyte) POST requests from Windows machines. Network traces show large delays (~150 ms) between packet acknowledgments.

Resolution

This performance problem can be corrected by setting an AIX network tuning option and applying AIX maintenance.

For all releases of AIX, set the tcp_nodelayack network option to 1 by using the following command:

no -o tcp_nodelayack=1

For AIX 5.1, apply the fix for APAR IY53226. For more information, see: IY53226

For AIX 5.2, apply the fix for APAR IY53254. For more information, see: IY53226

Problem Description

Unexpected network latency when the application is somewhat slow. Network traces show a normal HTTP 200 OK message for the first part of the response, then AIX waits ~150ms for a delayed ACK from the client.

Resolution

This performance problem can be corrected by setting an AIX network tuning option.

Set the rfc2414 network option to 1 by using the following command:

no -o rfc2414=1

9. Operating System Tuning Reference Materials

Instructions for tuning some operating system parameters are available in the WebSphere InfoCenter. Many of these parameters, such as TCP layer configuration or file descriptor configuration, apply to IBM HTTP Server as well.

10. Memory use comparison between IBM HTTP Server 1.3 and IBM HTTP Server 2.0

This comparison is not applicable to IBM HTTP Server on Windows, where memory usage is much more similar between 1.3 and 2.0.

Many customers on Unix systems have encountered paging (swap) space or physical memory problems with IBM HTTP Server 1.3 due to the large number of child processes which may be required, and the memory overhead per child process.

On AIX and Solaris, paging space is allocated based on the virtual memory size of the process, even for pages which are shared with the httpd parent process and will never be modified. For IBM HTTP Server 1.3, the majority of the virtual memory in a child process is shared with the parent process and never modified in the child, so while it contributes to the paging space usage (a disk allocation issue) it does not contribute to active paging (a performance issue).

This information can help determine how much paging space is required, as well as show some of the benefits of migrating to IBM HTTP Server 2.0 or later.

Customers should expect high virtual memory use for the entire set of IBM HTTP Server 1.3 processes; customers should check paging space utilization when encountering problems related to virtual memory, and ensure that enough paging space has been allocated to support the maximum configured number of httpd processes.

Scenario 1 for comparison

IBM HTTP Server versions

1.3.28.1 with latest maintenance; 2.0.47.1 with latest maintenance; WebSphere 5.1.1.8 plug-in

Solaris 9

MaxClients

500; for IBM HTTP Server 2.0.47.1, two child processes will be required; for IBM HTTP Server 1.3.28.1, 500 child processes will be required

https transports in WebSphere plug-in

One https transport will be configured. Note: Additional https transports add around 400KB more memory per child process. A configuration with multiple https transports will see much greater benefits when switching to IBM HTTP Server 2.0 or above.

SSL-enabled virtual hosts in web server

One SSL-enabled virtual host will be configured. Note: Additional SSL-enabled virtual hosts add around 400KB more memory per child process. A configuration with multiple SSL-enabled virtual hosts will see much greater benefits when switching to IBM HTTP Server 2.0 or above.

no memory-based caching enabled

The WebSphere plug-in ESI cache feature is available with either 1.3.28.1 or 2.0.47.1. There is one copy of the cache per child process, so it is much more memory-efficient with IBM HTTP Server 2.0 or above.

Process management configuration

1.3.28.1

StartServers 5

MaxClients 500

MaxSpareServers 500

MinSpareServers 1

MaxRequestsPerChild 0

2.0.47.1

ServerLimit 2

ThreadLimit 250

StartServers 2

MaxClients 500

MinSpareThreads 1

MaxSpareThreads 500

ThreadsPerChild 250

MaxRequestsPerChild 0

</IfModule>

Memory use measurements

1.3.28.1

(from ps -A -o pid,ppid,vsz,rss,comm)

PID PPID VSZ RSS COMMAND

22729 22676 13448 4768 bin/httpd

22721 22676 13448 4768 bin/httpd

22734 22676 13448 4768 bin/httpd

22745 22676 13448 4768 bin/httpd

22737 22676 13448 4768 bin/httpd

22719 22676 13448 4768 bin/httpd

22740 22676 13448 4768 bin/httpd

22731 22676 13448 4768 bin/httpd

22728 22676 13448 4768 bin/httpd

22741 22676 13448 4768 bin/httpd

22720 22676 13448 4768 bin/httpd

22724 22676 13448 4768 bin/httpd

22746 22676 13448 4768 bin/httpd

22717 22676 13448 4768 bin/httpd

22730 22676 13448 4768 bin/httpd

22718 22676 13448 4768 bin/httpd

22722 22676 13448 4768 bin/httpd

22732 22676 13448 4768 bin/httpd

22743 22676 13448 4768 bin/httpd

22739 22676 13448 4768 bin/httpd

22733 22676 13448 4768 bin/httpd

22676 1 13448 8760 bin/httpd

22742 22676 13448 4768 bin/httpd

(and 478 more children)

Totals: Total virtual memory size is about 6.7 GB. Total resident set size is about 2.4GB.

2.0.47.1

(from ps -A -o pid,ppid,vsz,rss,comm)

PID PPID VSZ RSS COMMAND

394 390 44240 36696 /home/trawick/testihsbuild/ihsinstall/bin/httpd

390 1 15136 9528 /home/trawick/testihsbuild/ihsinstall/bin/httpd

393 390 44144 36536 /home/trawick/testihsbuild/ihsinstall/bin/httpd

392 390 14552 3328 /home/trawick/testihsbuild/ihsinstall/bin/httpd

Totals: Total virtual memory size is about 117 MB. Total resident set size is about 86MB. Note that IBM HTTP Server 2.0 and above has an extra child process when CGI requests are enabled.

Scenario 2 for comparison

This is the same except for the OS, which is AIX 5.3.

Memory use measurements

1.3.28.1

(from ps auxw then post-processing and picking the largest child)

USER SZ RSS

root 11800 800

nobody 12956 1976

(and 499 more children like this)

Totals: Total virtual memory size is about 6.5 GB. Total resident set size is about 1 GB.

2.0.47.1

(from ps auxw then post-processing)

USER SZ RSS

nobody 32876 32932

nobody 33348 33384

root 632 668

nobody 636 808

Totals: Total virtual memory size is about 68 MB. Total resident set size is about 68MB. Note that IBM HTTP Server 2.0 and above has an extra child process when CGI requests are enabled.

11. Slow startup, or slow response time from proxy or LDAP with IBM HTTP Server 2.0 or above on AIX

In support of IPv6 networking, these levels of IBM HTTP Server will query the resolver library for IPv4 and IPv6 addresses for a host. This can result on extra DNS queries on AIX, even when the IPv4 address is defined in /etc/hosts. To work around this issue, IPv6 lookups can be disabled.

System-wide setting

Edit /etc/netsvc.conf, which configures the resolver system-wide. Add or modify the lookup rule for hosts so that it has this setting:

hosts=local4,bind4

That will disable IPv6 lookups. Now restart IBM HTTP Server and confirm that the delays with proxy requests or LDAP have been resolved.

IHS-specific setting

Add this to the end of ihsroot/bin/envvars:

NSORDER=local4,bind4

export NSORDER

12. High disk I/O with IBM HTTP Server on AIX

A customer reported that an internal disk mirror showed a high level of write I/O every 60 seconds which was related empirically to client load on the web server and which was determined to be unrelated to logging. AIX support narrowed down the specific web server activity related to the high write I/O and determined that it was due to file access times being updated by the filesystem when the web server served the page.

IBM HTTP Server 2.0 and above can send static files using the AIX send_file() API, which in turn can enable the AIX kernel to deliver the file contents to the client from the network buffer cache. This results in the file access time remaining unchanged, which solved this particular disk I/O problem.

The use of send_file() is controlled with the EnableSendfile directive. Several potential problems must be considered when IBM HTTP Server uses send_file(); thus it is disabled by default in the configuration files provided with the last several releases.

Work-arounds for potential problems with IBM HTTP Server and send_file()

13. High CPU in child processes after WebSphere plugin config is updated

The WebSphere plugin will normally reload the plugin configuration file (plugin-cfg.xml) during steady state operation if the file is modified. When the reload occurs during steady state operation, it must be reloaded in every web server child process serving requests. Initialization of https transports is particularly CPU-intensive so, if there are many such transports defined or many child processes, the CPU impact can be high.

One way to address the issue on Unix and Linux platforms is to disable automatic reload by setting RefreshInterval to -1 in plugin-cfg.xml, then use the apachectl graceful command to restart the web server when the new plug-in configuration must be activated. This will result in the reload occurring only once — in the IHS parent process. The new configuration will be inherited by the new child processes which are created by the restart operation.

Another way to address this issue is to utilize WebSphere 6.1 (or later) webserver definitions. This will allow you to have smaller plug-in config files because they are broken down in a way that each plugin-cfg.xml is only generated with the transports relevant to that web server. When the reload occurs, it doesn't reinitialize all the transports; only the one in the config that changed will be reinitialized.