Effective Maintenance Management. V. Narayan
Читать онлайн книгу.A.M. 1992. Reliability Centered Maintenance. McGraw-Hill. ISBN: 978-0070590465
Reliability Engineering for the Maintenance Practitioner
We can now develop some of the reliability engineering concepts that we will need in subsequent chapters. Prior knowledge of the subject is not essential, as we will define the relevant terms and derive the necessary mathematical expressions. As this is not a text on reliability engineering, we will limit the scope of our discussion to the following areas of interest.
•Failure histograms and probability density curves;
•Survival probability and hazard rates;
•Constant hazard rates, calculation of test intervals, and errors with the use of approximations;
•Failure distributions and patterns, and the use of the Weibull distribution;
•Generation of Weibull plots from maintenance records;
•Weibull shape factor and its use in identifying maintenance strategies;
For a more detailed study of reliability engineering, we suggest that readers refer to the texts 3,4,6 listed at the end of the chapter.
We discussed failures at the system level in Chapter 2. Failures develop as the result of one or more modes of failure at the component level. In the example of the engine’s failure to crank, we identified three of the failure modes that may cause the failure of the cranking mechanism.
If designers and manufacturers are able to predict the occurrence of these failures, they can advise the customers when to take corrective actions. With this knowledge, the customers can avoid unexpected production losses or safety incidents. Designers also require this information to improve the reliability of their products. In mass-produced items, the manufacturer can test representative samples from the production line and estimate their reliability performance. In order to obtain the results quickly, we use accelerated tests. In these tests, we subject the item to higher stress levels or operate it at higher speeds than normal in order to initiate failure earlier than it would naturally occur.
Let us take as an example the testing of a switch used in industrial applications. Using statistical sampling methods, the inspector selects a set of 37 switches from a given batch, to assess the life of the contacts. These contacts can burn out, resulting in the switch failing to close the circuit when in the closed position. In assessing the time-to-failure of switches, a good measure is the number of operations in service. The test consists of repeatedly moving the switch between the on and off positions under full load current conditions. During the test, we operate the switch at a much higher frequency than expected normally.
As the test progresses, the inspector records the failures against the number of operations. When measuring life performance, time-to-failure may be in terms of the number of cycles, number of starts, distance traveled, or calendar time. We choose the parameter most representative of the life of the item. In our example, we measure ‘time’ in units of cycles of tests. The test continues till all the items have failed. In Table 3.1, a record of the switch failures after every thousand cycles of operation is shown.
We can plot this data as bar chart (see Figure 3.1), with the number of switch failures along the y-axis, and the life measured in cycles along the x-axis.
To find out how many switch failures occurred in the first three thousand cycles, we add the corresponding failures, namely 0 + 1 + 3 = 4. By deducting the cumulative failures from the sample size, we obtain the number of survivors at this point as 37 − 4 = 33. As a percentage of the total number of recorded failures, the corresponding figures are 4/37 or approximately 11% and 33/37 or approximately 89% respectively.
Table 3.1
Figure 3.1 Number of failures recorded per cycle.
We can view this information from a different angle. At the end of three thousand cycles, about 11% of the switches have failed and 89% have survived. Can we use this information to predict the performance of a single switch? We could state that a switch that had not failed during the first three thousand cycles had a survival probability of approximately 89%. Another way of stating this is to say that the reliability of the switch at this point is 89%. There is no guarantee that the switch will last any longer, but there is an 89% chance that it will survive beyond this point. As time passes, this reliability figure will keep falling. Referring to the Table 3.1, we can see that at the end of five thousand cycles,
•The cumulative number of failures is 17;
•The proportion of cumulative failures to the sample size (37) is 46%;
•The proportion of survivors is about 100% − 46% = 54%.
In other words, the reliability is about 54% at the end of five thousand cycles. Using the same method, by the end of nine thousand cycles the reliability is less than 3%.
How large should the sample be, and will the results be different with a larger sample? With a homogeneous sample, the actual percentages will not change significantly, but the confidence in the results increases as the sample becomes larger. The cost of testing increases with the sample size, so we have to find a balance and get meaningful results at an acceptable cost. With a larger sample, we can get a better resolution of the curve, as the steps will be smaller and the histogram will approach a smooth curve. We can normalize the curve by dividing the number of failures at any point by the sample size, so that the height of the curve shows the failures as a ratio of the sample size. The last column of Table 3.1 shows these normalized figures.
3.2 PROBABILITY DENSITY FUNCTION
This brings us to the concept of probability density functions. In the earlier example, we can smooth the histogram in Figure 3.1 and obtain a result as seen in Figure 3.2. The area under the curve represents the 37 failures, and is normalized by dividing the number of failures at any point by 37, the sample size. In reliability engineering terminology, we call this normalized curve a probability density function or pdf curve. Because we tested all the items in the sample to destruction, the ratio of the total number of failures to the sample size is 1. The total area under the pdf curve represents the proportion of cumulative failures, which is also 1.
Figure 3.2 Probability density function.
If we draw a vertical line at time t = 3,000 cycles, the height of the curve gives the number of failures as a proportion to the sample size, at this point in time. The area to the left of this line represents the cumulative failure probability of 11%, or the chance that 4 of the 37 items would have failed. The area to the right represents the survival probability of 89%. In reliability engineering terminology,