Jobs And Interview Forum Index Jobs And Interview
Jobs, Interviews, Resume Tips
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Evaluating the Effectiveness of Training Programs

 
Post new topic   Reply to topic    Jobs And Interview Forum Index -> Career Help Line
View previous topic :: View next topic  
Author Message
admin
Site Admin


Joined: 05 Jul 2005
Posts: 689

PostPosted: Fri Dec 02, 2005 5:22 am    Post subject: Evaluating the Effectiveness of Training Programs Reply with quote

FACTORS AFFECTING ADULT EDUCATION

Before discussing the evaluation of training, it is important to explain the elements that serve as an impetus for adult education and workplace training. Merriam and Caffarella (1991) identify three major areas of change that influence adult learning: Demographic changes; Economic changes; and Technological changes.

Demographic Changes

One of the changing elements of demographics is age. According to Merriam and Caffarella, there are more Americans aged sixty-five and older than there are Americans aged twenty-five and younger. The aging of the population will continue well into the next century, and the demand for quality adult education will rise accordingly. The rapid growth of cultural and ethnic diversity in the U.S. is another changing element of demographics. The United States is now experiencing a wave of immigration, primarily from Asia and Latin America, that parallels the influx of Europeans at the beginning of this century. By the year 2000, minorities are expected to compose 29 percent of the U.S. population (Merriam & Caffarella, 1991). In order to tap this valuable resource, employers will need to provide specialized training to help these people adjust to the American workplace.

Economic Changes

Economic changes are also having and will continue to have an impact on adult learning and training. Many experts, such as Naisbitt and Aburdene (1990), contend that the economies of the world are now interdependent. Consequently, major companies are allowing, encouraging, and sometimes subsidizing their employees' education in order to become more competitive and to increase their chances for survival in a world economy.

Another critical economic change is the shift from a manufacturing economy to a service economy, which has produced a change in the job market and has affected the kind of training that employees need. Moreover, changes in the composition of the U.S. work force itself are influencing training. For example, since World War II women have become an integral part of American organizations.

Technological Changes

Finally, the advancements in technology will continue to shape and define adult training needs, primarily because of the advent of the personal computer. Computers have revolutionized every aspect of corporate education, allowing people to produce, analyze, and manipulate data with greater ease than before. By some accounts (for example, Apps, 1988), every seven years the amount of information generated in the world doubles. Furthermore, about half of the information that most professionals learn will be outdated in about five years.

The implication of technological advancements is that learning is a lifelong proposition. Not only will there be a demand for training to keep up with technological advancements, but there also will be a demand to retrain the millions of Americans who will be displaced because of such advancements.

WHY EVALUATE?

Although for many years trainers have attempted to evaluate their programs, until quite recently, there has not been a bona fide effort to use valid and reliable methods to conduct such evaluations. Furthermore, some trainers gather data for evaluation but do not analyze those data for trends or use them to improve existing training programs. Such an oversight can be costly, especially in light of the billions of dollars that have been spent and will continue to be spent annually on training efforts as a result of the demographic, economic, and technological changes just discussed.

It is important to remember that effective evaluation is multifaceted. All of the literature recognizes the importance of evaluation in terms of client orientation and economic return. In other words, most researchers in the field understand that clients, whether they are those who have hired the trainer or those who have participated in the training, must be satisfied with that training. If clients do not perceive a return on their investment, whether measured in terms of time or dollars, they may not be willing to continue to invest in training.

LEVELS OF EVALUATION

There are several components to an effective evaluation program. One of the most comprehensive and widely referenced models of evaluation is Donald Kirkpatrick's (1979). The four levels of this model are as follows:

Reaction; Learning; Behavior; and Results.

The balance of this article reviews the current research on evaluation in light of Kirkpatrick's model.

Level 1: Reaction Evaluation

Reaction is the term that Kirkpatrick uses to refer to how well the participants liked a particular training program. Evaluation of participants' reactions consists of measuring their feelings; it does not include a measure of actual learning. Kirkpatrick contends that although the evaluation of reactions is an easy measurement, many trainers do not follow these five essential steps for accurate measurement:

1. Determine what information is desired.

2. Devise a written "comment sheet" that includes items determined in the previous step.

3. Design the sheet so that reactions can be easily tabulated and manipulated by statistical means.

4. Make the sheets anonymous.

5. Encourage the participants to make additional comments not elicited by questions on the sheet.

Although Kirkpatrick suggests that participants should feel free and be encouraged to make additional comments, he also contends that this type of qualitative data is extremely difficult to analyze. Thus, it is difficult to discern any patterns or trends in order to revise the training program.

Other researchers have different perspectives regarding the evaluation of participants' reactions. For instance, Antheil and Casper (1986) state that participant reaction is a measure of "customer satisfaction" indicating the level of effectiveness and usefulness of the training program at the time the participants are experiencing it and sometimes weeks or even months afterward. However, they are careful to stress that data collected regarding participant reactions reflect participant opinions and should not be considered proof of learning.

To determine what training-evaluation tools were being used by industry, Fisher and Weinberg (1988) of Bell Communications Research, Incorporated (Bellcore) conducted a phone survey in March of 1986. The data indicated that the typical instrument to gather information regarding reactions was a "short, quickly constructed, open-ended questionnaire" (p. 73). This "happy sheet" (p.73), as Fisher and Weinberg refer to it, provided subjective impressions and no data that could withstand statistical analysis or measures for reliability. Because there was no adequate tool for evaluation, the Bellcore System developed a new instrument with items addressing the trainer's behavior, the participant's experience, and other issues phrased as open-ended questions.

This questionnaire, like most such instruments, focuses on participant reactions-not learning or the transfer of learning. For instance, one item on the questionnaire reads, "The-e course presented useful information" (p. 76). The participants are then asked to rate the statement on a Likert scale. Fisher and Weinberg (1988) warn that while this questionnaire does provide a "general estimate of a particular course's success based upon the views of the participants" (p. 75), the data may be somewhat inaccurate because participants have a tendency to report what a trainer wants to hear. Also, some questionnaires have poorly constructed questions or items that predispose participants to respond in predicted ways.

Some trainers and researchers feel that measurements of participant reactions are inaccurate and counterproductive. For instance, Conway and Ross (1984) found that participants have a tendency to underestimate their pretraining skills and overestimate their posttraining skills in an attempt to justify participating in the training. Their research is consistent with research in the field of social psychology indicating that people have a strong need to justify their behavior and actions and consequently may alter their opinions and their interpretation of past events. Therefor-e, if trainers continue to use participant reactions as the sole means of evaluation-and management continues to allow such use-the outcome can be misleading and extremely costly.

Carnevale and Schulz (1990) go a step further. They claim that "participant reactions are easy to collect but provide little substantive information about training's worth" (p. s-15). They also claim that because data concerning participant reactions do not reveal the actual learning that has taken place, those data do not accurately indicate the return on investment for training efforts. They state that because of such unreliable data, many trainers have stopped using reaction sheets. However, Carnevale and Schulz go on to say that most trainers believe participants' favorable reactions are crucial to a program's success and that participants whose reactions are favorable tend to be more receptive to the material and consequently more likely to use it on the job.

Dixon (1987, p.108) claims that "the use of participant reaction forms can cause more problems than benefits for the training function of an organization". This statement is especially true when participant reactions are the only evaluation method used. Dixon contends that three major problems result from the use of reaction forms:

1. The expectation that training must be entertaining. Because reaction sheets measure how the participants felt about the training, the trainer may tend to emphasize participant enjoyment during the training rather than substantive information. As a trainer is often rewarded with high marks when the participants enjoy themselves, this relationship between evaluation and participant enjoyment can become a vicious cycle. The trainer's ratings are also a major factor in the rewards that the trainer receives from management or the client organization: renewal of a contract or a promotion. Obviously, under these circumstances the use of a reaction sheet can lead to a conflict of interest.

2. Faulty instructional design. The term "faulty instructional design" refers to a questionnaire design that asks for information that participants cannot legitimately provide. As Dixon (1987) states, the art of questionnaire design is to ask questions for which a participant can give informed responses.

3. The perception that learning is passive rather than active. This perception refers to the common belief that it is the trainer's responsibility to ensure that participant learning occurs. Measuring how well this responsibility has been met with a reaction sheet is problematic, as a reaction sheet asks questions about the trainer's performance and the course design without asking about the participants' efforts to learn. Dixon emphasizes that evaluation and learning are not complete unless both functions have been measured. Ultimately, it is the responsibility of the trainer to provide information and the responsibility of the participant and the trainer to process the information. Reaction sheets rarely take into account the participant's role as part of the training program.

Level 2: Learning Evaluation

According to Kirkpatrick (1979), the second level of analysis in the evaluation process is that of learning. Kirkpatrick defines learning as the "principles, facts and techniques that were understood and absorbed by the participants" (p. 82) and identifies the following guidelines or standards for evaluation in terms of learning:

Each participant's learning should be measured by quantitative means. A pretest and posttest should be administered so that any learning can be attributed to the training program. The learning should be measured by objective means. When feasible, a control group should be used so that comparisons can be made with the actual training group. When feasible, the evaluation results should undergo statistical analysis so that learning can be viewed in terms of correlation and/or levels of confidence.

Obviously, evaluation of learning is much more difficult to measure than reaction. According to Kirkpatrick's guidelines, a knowledge of statistical procedures is essential for accurate and meaningful measurement.

Endres and Kleiner (1990) state that pretests and posttests are necessary when evaluating the amount of learning that has taken place. Without a point of comparison, the measurement of learning at the end of the training program will not reveal exactly how much knowledge has been obtained from the training experience. Although paper-and-pencil tests are the most frequently used tools to measure knowledge, there are other means for gathering this kind of data.

For instance, when simulations, role plays, or demonstrations are used to measure knowledge, the trainer can use before-and-after situations in which participants can demonstrate or perform the knowledge and techniques that they have learned. This information is consistent with Kirkpatrick's research on the measurement of learning. In fact, like Endres and Kleiner, Kirkpatrick maintains that simulations and demonstrations can closely approximate the participants' work environment and can help them relate the learning in meaningful ways, especially when specific job skills are the focus of the training.

According to Carnevale and Schulz (1990), the measurement tools used to evaluate learning should reflect each training program's particular objectives. Also, measures of learning changes may be taken during or at the end of a training session. Carnevale and Schulz warn that such a measure of learning changes "may indicate that a program's instructional methods are effective, but it doesn't show whether or how participants' new learning will be applied on the job" (p. s-16).

A useful process for reviewing items on a measurement tool that evaluates learning has been suggested by Cantor (1990):

1. Determine the acceptable task level by objective.

2. Determine whether each objective is adequate.

3. Identify the items associated with each objective.

4. Determine whether the items match the objectives.

These steps are consistent with the instructional systems design method and will help ensure that items will be reliable and valid means for determining whether learning has occurred.

Research by Antheil and Casper (1986, p. 5Cool indicates that "evaluation of learning at this level closely resembles testing" and most often takes the form of paper-and-pencil tests. They suggest that the typical measurement tool includes gathering pretest and posttest data to determine-nine the amount of learning that has been acquired. They also stress that skill demonstrations in a learning situation merely indicate whether a participant can use the skills-not whether he or she will use them.

Level 3: Transfer-of-Learning Evaluation

Kirkpatrick's third level in the evaluation model is transfer of learning. In the HRD literature there are relatively few examples of studies that have specifically attempted to assess the transfer of training skills or knowledge to the job. Even Kirkpatrick (1979, p. 86) warns that "evaluation of training programs in terms of on the job behavior is more difficult than the reaction and learning evaluations...... As a result, much training is delivered without a plan for measuring the transfer of training. Kirkpatrick goes on to suggest a framework for evaluating training programs in terms of behavioral changes:

1. A systematic appraisal should be made of on-the-job performance on a before-and-after basis.

2. The appraisal of performance should be made by one or more of the following parties (the more the better): The participant: The participant's superior(s); The participant's subordinates; and/or The participant's peers or other people who are familiar with the participant's performance.

3. A statistical analysis should be made to compare before-and-after performance and to relate changes to the training program.

4. The post-training appraisal should be made three months or more after the training so that the participants have an opportunity to practice what they have learned. Subsequent appraisals may add validity to the study.

5. A control group (of people who did not receive the training) should be used.

Antheil and Casper (1986) propose a comprehensive evaluation model based on Kirkpatrick's four levels, which they call "program effects levels." Their three-step procedure for implementing the model is as follows:

1. Discuss the focus and goals of the evaluation study with the identified evaluation audience.

2. Design and implement data-collection strategies aimed at tapping one or more levels of program effects. These strategies should reflect the audience's expressed needs for information.

3. Communicate evaluation results to the audience through a process that incorporates various user needs and abilities to learn from and use results. Encourage joint interpretation of the data.

Antheil and Casper (1986) emphasize the importance of collecting and presenting the information in a way that will be meaningful and relevant for the specific audience involved. This level of evaluation not only assesses the performance of the person who receives the training, but also provides valuable feedback to those involved in redesigning existing training programs or in designing programs to meet future needs. This information is also useful to those who will be evaluating the effectiveness of the overall training program. The collection of qualitative as well as quantitative data is encouraged by Antheil and Casper. They suggest logs, diaries, and observer narratives, for example.

Endres and Kleiner (1990) use Kirkpatrick's model in suggesting an approach to evaluating the effectiveness of management training. They caution against relying on in-house performance-appraisal systems as the primary measure of transfer of learning, as it is difficult to separate the effects of training efforts from those of other factors. Instead, they suggest setting initial performance objectives and monitoring accomplishment of those objectives after training. They offer an example in which participants write personal and professional objectives at the end of the training experience. These objectives are then sent to the participants approximately a week after the training. Two months later they are sent again, and the participants are asked to comment on their performance against these objectives. A certificate of completion for the training is issued only after each participant's feedback is secured.

Like Kirkpatrick, Endres and Kleiner suggest multidimensional on-the-job evaluations, including feedback from the participant, his or her subordinates, and peers. "By using all three forms of feedback," they say, "the built-in biases of the evaluator can be reduced as the number of evaluators having different perspectives is increased" (p. 6).

Finally, they remind evaluators that other factors can impact the effectiveness of management training and development, including the manager, the trainer, the organization, and the environment. As they state, "All four are complex creatures" (p. 7).

Nanda (1988) also looked at the transference of supervisory skills following training programs and found that most supervisory-training programs are knowledge based. However, to be of value to the trainee and the organization, that knowledge must result in a change of attitude, followed by a change in the supervisor's behavior. Unfortunately, the impact of most supervisory-training programs does not go beyond knowledge and awareness. One factor that often inhibits transference of learning is the organizational climate, which may be inconsistent with what is taught in the training program. This inconsistency often renders such training programs entirely ineffective. As Nanda (p. 2Cool says, "perhaps changes in attitude among top managers are key to the skill development of supervisors."

The instrumental impact of the on-the-job environment is consistent with Bandura's findings in the studies that led to the development of Social Learning Theory. Bandura (1965) found that any learning that may have been gained by observing the behavior of models was completely wiped out by the subsequent incentives received for the performance of a specific response, leading him to conclude that "mere exposure to modeling stimuli does not provide sufficient conditions for imitative or observational learning" (p. 593).

Kelly (1982) starts with the assumption that typically only 10 percent of a company's training transfers skills to the job. What happens to the other 90 percent of training? She suggests that 40 percent is lost because the training function is often isolated or peripheral: "Therefore, management, who views anyone paid to do a peripheral job as a peripheral person, will not bring that person's ideas into the workplace" (p. 102). An additional 40 percent, she suggests, is lost because most trainers or management educators do not build transfer into the training programs. Finally, 10 percent may be lost when the course designer does not deliver the training.

For skills to be transferred to the job, Kelly believes that they must be built into the training "before the first specific behavioral objective is chosen, before the first course activity is imagined or before a packaged product is selected" (p. 104). In other words, the course should be designed with the specific intent of transfer to the actual job situation.

Kelly's comment stresses an important point. In order to study whether skill transfer related to training has in fact occurred, one must establish a baseline of current skills or knowledge before the training occurs. For example, six months after a two-day workshop on supervisory skills, Swierczek and Carmichael (1985) conducted a survey in which they attempted to measure whether participants in the workshop actually used the learned skills. They found that they were hampered by the lack of baseline information: "Therefore the results cannot be linked definitively to the workshop" (p. 97). Of course, following a good process for instructional system design would suggest pretesting, both to establish such a baseline as well as to determine the need for training in the first place.

Mahoney (1980) suggests that management training be evaluated using three criteria:

Targets - working on relevant issues; Time - working efficiently; and Transfer - producing results on the job.

To optimize transference of management training, Mahoney suggests that a manager who wishes to train subordinates in supervisory skills should conduct a series of working meetings on specific issues. The issues selected should be ones that have been identified by the subordinates (thus meeting the "targets" criterion). Next, Mahoney suggests that the training be divided into a series of half-day segments. One criticism of training is that it takes too much time both for the participant and for the manager/trainer. However, if the working meetings are limited to four to eight times per year, they represent only a 1- to 2- percent investment of each subordinate's time (thus meeting the "time" criterion). Finally, Mahoney suggests that training be designed with an "action research" process in mind. With proper selection of training topics and content, the manager's subordinates will actually take their jobs with them to each training session. Thus, "the job and the training are separated only by the training setting, not by process and not by content" (p. 29) (thus meeting the "transfer" criterion).

A synthesis of the literature reviewed here suggests the following ten guidelines for designing training that ensures transfer:

1. Build a plan for transfer into the training program from the outset.

2. Make sure that the work environment provides positive incentives to apply the skills gained in training.

3. Consider the audience-the people who will use the evaluation results. Collect data and report results with the audience in mind.

4. Set initial performance targets based on the training needs identified in the assessment phase.

5. Use specific topics that are relevant and job related.

6. Use the work-group manager or supervisor to deliver the training whenever possible.

7. Keep training sessions short.

8. Ensure that practice during the training sessions clearly matches the on-the-job situation.

9. Plan for the assessment of skill transfer to be multidimensional, including the participant as well as the participant's subordinates, peers, and supervisor(s) whenever possible.

10. Do not consider the training to be complete until transference has been evaluated.

It is interesting to note that if transfer of learning is considered at all, this consideration usually occurs after the training has been designed or even delivered. However, most of the guidelines suggested above should be followed during the design phase.

Level 4: Results Evaluation

Kirkpatrick's fourth level of evaluation is results or impact on the organization. Attempting to measure results is not for the fainthearted! Although measuring training programs in terms of results may be the best way to measure effectiveness, Kirkpatrick himself (1979, p. 89) points out that "there are ... so many complicating factors that it is extremely difficult if not impossible to evaluate certain kinds of programs in terms of results." The separation of variables to measure how much of the improvement is due to training is extremely difficult. Instead of offering a specific formula, Kirkpatrick simply reports anecdotal efforts to measure results. He does applaud attempts by researchers such as Likert to use qualitative data in measuring results, but he laments the fact that current research techniques are essentially inadequate and that progress in this area is slow.

Zenger and Hargis (1982) recommend experimental-research designs using pretesting and posttesting of experimentally trained groups with untrained control groups. However, outside an ideal laboratory environment, this approach is not without its challenges.

Ban and Faerman (1990) report on their attempt to measure both skill transference and results following an intensive, twenty-four-day advanced supervisory-training program. They had hoped to study impact with an experimental design by surveying a control group of managers who had not participated in the training program. However, they had to abandon this part of their study because of logistical problems. They conclude that "the literature on training evaluation may be too optimistic in recommending experimental or quasi-experimental design for many field situations" (p. 278).

Similarly, Trapnell (1984, p. 92) remarks that 'impact evaluation is not a science" because of the number of variables other than training that may affect long-term results. Despite this comment, though, Trapnell encourages the use of available secondary data, such as savings resulting from reductions in downtime, accident rates, absenteeism, customer returns, assembly-line rejects, staff turnover, and employee grievances.

In an update to Zenger and Hargis' 1982 article, Kelly, Orgel, and Baer (1984) recommend quasi-experimental designs based on samples and groups that exist naturally in the work environment. An example would be two similar departments, one that receives training and one that does not. Rather than evaluating performance differences statistically and presenting those statistics-which, according to them, few people really understand-they suggest demonstrating results visually through graphic presentations.

The literature offers an account of at least one attempt to apply an econometric model to the evaluation of costs and benefits of training. Schmidt, Hunter, and Pearlman (1982) adapted "linear-regression-based decision-theoretic equations" to estimate the dollar impact of "intervention programs designed to improve job performance" (p. 333). The models they used were originally developed to estimate the dollar impact of valid selection procedures on work-force productivity. Typical studies on the value of selection procedures are highly statistical. However, "in general, organizational decision makers are less able to evaluate these statistics than statements made in terms of dollars" (p. 334). The model developed depends on a number of key assumptions, several of which must be inferred or estimated because they are not typically available from prior research.

Using their model, Schmidt, Hunter, and Pearlman estimated the value of a training program for one hundred programmers at more than one million dollars. In general, they hypothesized that "the economic impact or intervention programs may be greater than industrial/organizational psychologists have realized" (p. 340).

Using Schmidt, Hunter, and Pearlman's procedure, Sheppeck and Cohen (1985) propose a somewhat less statistical "utility" formula:

UTILITY= YD x NT x PD x V-NT x C, where:

YD = years of duration of effect on performance; NT = number of employees trained; PD = performance difference between trained and untrained employees; V = "Value"-the standard deviation of job performance in dollars; and C = cost per trainee.

This formula still depends on estimates for several variables. The most obscure is the concept of "value," a statistic that is not readily available for most jobs. Sheppeck and Cohen provide several suggested estimates of "value" based on the few actual studies reported in the literature, but they suggest that Schmidt, Hunter, and Pearlman's range of 40 to 70 percent of annual salary is a reasonable estimate when actual figures are unknown. They suggest further studies in a variety of occupational settings to develop more precise, job-specific estimates for each of these variables.

Given the difficulty of results evaluations and the relative lack of objective, valid tools to use, are they worth pursuing? McEvoy and Buller (1990) suggest not only that it would be wise to think twice about pursuing such evaluations but also that not all training is results oriented. They describe their attempts to conduct a comprehensive, four-step evaluation of their training for developing executive leadership, which is similar to Outward Bound. They found that training is often used for purposes other than achieving a measurable impact on the performance of an individual employee or the organization. For example, sometimes training is seen as a perquisite for performance that has already been judged successful or as a cultural "rite of passage" that all those hoping to advance must complete. In these cases the value of the training is more symbolic than technical.

McEvoy and Buller even went so far as to use a utility formula similar to that described by Sheppeck and Cohen to assess the dollar impact of their program for one of their clients. Using the most conservative assumptions for the model, they still estimated a net benefit of over a half-million dollars! They decided not to share the estimate with the client-because they did not think the client would believe it. The formula is not at all intuitive, and they reasoned that sharing the figure would hurt their credibility rather than help it.

These studies suggest that evaluation training on the basis of results or organizational impact may not be the ultimate measure. In the years since Kirkpatrick proposed his model, little has been added in the way of specific, valid tools to objectively measure training impact. Most promising are the quasi-experimental methods suggested by Kelly, Orgel, and Baer (1984) using graphic representations of hard data. Unfortunately, we may see few examples of this approach in the literature, as it lacks the scientific rigor that most journals favor.

It would also be a good idea to conduct further studies in a greater variety of occupational settings to determine reasonable, more precise estimates of performance differences between trained and untrained employees as well as value (that is, the standard deviation of job performance in dollars between trained and untrained employees). This research, however, may have some ethical hurdles to cross if it involves consciously withholding training from some people.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    Jobs And Interview Forum Index -> Career Help Line All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com


For Support - http://forums.BizHat.com

Free Web Hosting | Free Forum Hosting | FlashWebHost.com | Image Hosting | Photo Gallery | FreeMarriage.com

Powered by PhpBBweb.com, setup your forum now!