Any one could tell me when I should use CPK or PPK? THKS!
Six Sigma – iSixSigma › Forums › Old Forums › General › Any one could tell me when I should use CPK or PPK? THKS!
 This topic has 92 replies, 61 voices, and was last updated 12 years ago by Stevo.

AuthorPosts

June 6, 2001 at 4:00 am #66871
sundeep singhMember@sundeepsingh Include @sundeepsingh in your post and this person will
be notified via email.Cpk is for short term
Ppk is for long term
if we feel that short term data can solve our problem
we can use Cpk otherwise Ppk0June 6, 2001 at 4:00 am #66873
PraneetParticipant@Praneet Include @Praneet in your post and this person will
be notified via email.Hi Preston,
I shall attempt to answer your query… Basically let us get the fundamentals clear.
Cpk = Basically means “Process Capability INdex” an index of how good ur process is.Ppk = Process Performance Index, this index basically tries to verify if the sample that you have generated from the process is capable to meet Customer CTQ’s (read requirements)It differs from Process Capability in that Process Performance only applies to a specific batch of material. Samples from the batch may need to be quite large to be representative of the variation in the batch. Process Performance is only used when process control cannot be evaluated. An example of this is for a short preproduction run.
Process Performance generally uses sample sigma in its calculation; Process capability uses the process sigma value determined from either the Moving Range , Range , or Sigma control charts.The calculated Process Performance and Process Capability indices will likely be quite similar when the process is in statistical control.
One more suggestion , instead of manually computing the Cpk & PPk figure try using Minitab , a statistical Software for your computations.
WHen u enter the data for the process capability study the sotware generates a Graph which computes both the Cpk as well as Ppk ..
Hope I have helped you in clearing your doubts about Cpk & Ppk
Regards
Praneet0June 6, 2001 at 4:00 am #66885Preston,
Consider if the data are in time order that Cpk would be the estimate of choice. If the data are not in time order, or are randomly distributed than Ppk is your alternate estimate of choice.
If the data are in time order then both estimates can be made. While there is debate over the value of one versus the other, the general line of thought is that Ppk gets closer to what the customer ultimately sees as the output from your process. You should always try to look at both estimates, if possible, before making an official report. If Ppk varies by more than about 20% from Cpk, then insure the data are statistically stable. If the data are not stable, then it’s best not reporting these estimates until you can identify and remove the sources of special cause variation.
Ken
0June 6, 2001 at 4:00 am #27362
prestonParticipant@preston Include @preston in your post and this person will
be notified via email.As we all know, CPK means “production capacity”, and PPK means “production performance”, What I want to know is if there is any difference of application condition between them and if we could use both of them in the same time though they have different caculating formula .
0June 7, 2001 at 4:00 am #66895
Michael WhaleyParticipant@MichaelWhaley Include @MichaelWhaley in your post and this person will
be notified via email.Preston:
Ppk produces an index number (like 1.33) for the process variation. Cpk references the variation to your specification limits. If you just want to know how much variation the process exhibits, a Ppk measurement is fine. If you want to know how that variation will affect the ability of your process to meet customer requirements (CTQ’s), you should use Cpk. As a practical consideration, the Ppk study requires less work. In Six Sigma, I prefer to do several Ppk studies to identify the points in the process with the highest variation. After attacking those processes, I’ll do a Cpk study to confirm the process’ ability to consistently meet (or exceed) the customer’s requirements.
Hope this helps,
Michael
0June 7, 2001 at 4:00 am #66902Interesting discussion. I find that the similarity differences between PpK and Cpk can give a lot of insight into a process and how to approach improving it.
Cpk is calcualted using an estimate of the standard deviation caluculated using Rbar/d2.
PpK uses the usual form of the standard deviation ie the root of the varience or the square root of the sum of squares divided by n1.The Rbar/D2 estimation of the standard deviation has a smoothing effect and the Cpk statistic is less sensitive to points which are further away from the mean than is Ppk.
As the Ppk and Cpk converge the reliability of the process increases. This has implications for the effectiveness of you experimentation.
It could be argued that the use of Ppk and Cpk (with sufficient sample size) are far more valid estimates of long and short term capability of processes since the 1.5 sigma shift has a shaky statistical foundation.
I prefer to use Ppk when speaking to our internal customers when determining the performance of the process and when setting goals and metrics.
Hope this is useful.
Best wishes,
Eoin0June 7, 2001 at 4:00 am #66907
Jim ParnellaParticipant@JimParnella Include @JimParnella in your post and this person will
be notified via email.Preston,
To add to the information already posted:
Cpk is calculated using RBar/d2 or SBar/c4 for Sigma in the denominator of you equation. This calculation for Sigma REQUIRES the process to be in a state of statistical control. If not in control, your calculation of Sigma (and hence Cpk) is useless – it is only valid when incontrol. Cpk is called the “process CAPABILITY index”. Think of it this way … Cpk tells you what the process is CAPABLE of doing in future, assuming it remains in a state of statistical control.Ppk uses the sample standard deviation in the denominator. It does not require statistical control to be valid. Ppk is called the “process PERFORMANCE index”. Think of it this way … Ppk tells you how the process has performed in the past. You cannot use it predict the future, like with Cpk, because the process is not in a state of control.
Think of it a different way: You can use Cpk as a measure of PAST performance and FUTURE capability (assuming statistical control is maintained). Ppk however can only be used as a measure of PAST performance.
Also, like stated previously, the values for Cpk and Ppk will converge to almost the same value when the process is in statistical control. that is because Sigma and the sample standard deviation will be identical (at least as can be distinguished by an Ftest). When out of control, the values will be distinctly different, perhaps by a very wide margin.
I hope this helps you decide which to use.
Jim
0June 7, 2001 at 4:00 am #66911
ed rosaParticipant@edrosa Include @edrosa in your post and this person will
be notified via email.I’ve always reported PPk:
1. in the beginning of a long range production of a
part so I could capture the assignable and non
assignable causes.Cpk – process in control – is calculated and reported
2. After finding the “root causes” and eliminating
them from the process.Absolutely important:
3. Now let’s assume that there are safety or critical
dimensions or parameters in the part, and the process
still produces “stragglers”, which are detected and
removed from the system – and not included in the chart
so the process charts show “in control”. Therefore a
reasonable Cpk is reported. The customer will sooner or
later find out that this is happening, will be alarmed
that your process is not robust, and will require that
a PPk, which includes the stragglers, be reported, so
the actual extent of a potential problem can be
quantified. Avoid this situation at all cost !!0June 9, 2001 at 4:00 am #66935
SAMIR MISTRYMember@SAMIRMISTRY Include @SAMIRMISTRY in your post and this person will
be notified via email.PPK ANALYSIS IS MUST TO UNDERSTAND THE PROCESS BEHAVIOUR AND HENCE ITS PERFORMANCE. THE PROCESS SHOULD REMAIN UNTOUCHED WHILE DOIND THE STUDY. THE BEHAVIOUR THEN SHOULD BE STUDIED TO REMOVE ALL ASSIGNABLE CAUSES TILL THE PPK VALUE IS ACHEIVED >= TO 1.67. AFTER ACHEIVING PPK 1.67 THE PROCESS BEHAVIOUR IS KNOWN BY KILLING ALL ASSIGNABLE CAUSES. USE THE CALCULATED CONTROL LIMITS IN PPK CHART FOR ONGOING PROCESS CAPABILITY (CPK). THEN IT IS A MONITORING TOOL, WHETHER THE PROCESS OPERATES IN GIVEN CONTROL LIMITS WHICH ARE REALISTIC &STATISTICALLY CALCULATED
0June 11, 2001 at 4:00 am #66970
Kim NilesParticipant@Kniles Include @Kniles in your post and this person will
be notified via email.Montgomery states in his book: Montgomery, Douglas. C. “Introduction to Statistical Quality Control”. Wiley & Sons, Inc. New York. 2001. 4th ed. Pg 372, that in 1991 the Automotive Industry Action Group (AIAG) was formed with one of their objectives being to standardize industry reporting requirements. He says that they recommend Cpk when the process is in control and Ppk when it isn’t. Montgomery goes on to get really personal and emotional about this which is unique to this page of this book and other books I have of his. He thinks Ppk is baloney as he states “Ppk is actually more than a step backwards. They are a waste of engineering and management effort – they tell you nothing”.
While Montgomery gets frustrated over the use of Ppk, he does a poor job of explaining what a stable process is. I respect his works just the same as he alone has even attempted to try to explain the difference between a stable process and a nonstable one.
I plan to continue this debate under a different heading (see: What is a stable process?).
I hope this helps.
Sincerely,
KN – http://www.znet.com/~sdsampe/kimn.htm0July 10, 2001 at 4:00 am #67497
ChantalParticipant@Chantal Include @Chantal in your post and this person will
be notified via email.Preston,
In fact there are 4 such measures : Cp, Cpk, Pp & Ppk. The first 2 are for computing the index with respect to the subgrouping of your data (different shifts, machines, operators, etc.), while the last 2 are for the whole process (no subgrouping). For both Ppk & Cpk the “k” stands for “centralizing facteur” it assumes the index takes into consideration the fact that your data is maybe not centered (and hence, your index shall be smaller).
It is more realistic to use Pp & Ppk than Cp or Cpk as the process variation cannot be tempered with by inappropirate subgrouping. However, Cp and Cpk can be very useful in order to know if, under the best conditions, the process is capable of fitting into the specs or not.It basically gives you the best case scenario for the existing process.
I hope this helped you a bit. If you have anymore questions, don’t hesitate to contact me.
Chantal
0August 21, 2002 at 5:35 pm #78282
Binh NguyenParticipant@BinhNguyen Include @BinhNguyen in your post and this person will
be notified via email.Hi Chanta,
Why I see in the spec requirement for some tests is Ppk>1.66 or Cpk> 1.33. For calculation of Cpk and Ppk, what is the diffrence in formula? or they are the same except for the quanity of sample?
Another questions, Could I use DE histogram to calculate Ppk ?
or only for Cpk?
Thank you in advance.
B. Nguyen
0August 22, 2002 at 4:59 am #78287Hi Binh
The ppk and cpk is similar, the only difference is the method we calculate the sigma for them.
Normally we just use cpk with the formula we practiced “min(mean – LSL, USL – mean) / 3 sigma”.
Only if you have man sample data from small groups, you saw ppk and cpk together.
For sigma in cpk, it is calcuate by “Rbar / d2)
For sigma in ppk, it is calculate from all individual data.
Hope this helpful.
Vincent, China
0August 24, 2002 at 1:11 am #78382
Binh NguyenParticipant@BinhNguyen Include @BinhNguyen in your post and this person will
be notified via email.Thank Vincent,
Another question: How could I choose the sample size?
If I have 25 wafers in 1 lot and each wafers have 2000 die.
Could I calculate Ppk from a sample of 2000 units built from 1 wafer to present the lot? How big is the size should be to get the Ppk?
Best Rgds.
B. Nguyen0June 4, 2003 at 6:15 pm #86669
E HargroveParticipant@EHargrove Include @EHargrove in your post and this person will
be notified via email.Under “ideal” conditions the result of Cpk and Ppk should be the same. Ideal conditions being: normal distribution, sufficient data set size, nonautocorrelated data, data in statistical control.
Strictly speaking the use of moving range is an inferential for standard deviation. Anything that would cause moving range / d2 to deviate from the standard deviation would, again strictly speaking, invalidate the use of moving range and therefore Ppk. However, the characteristic that Ppk is less sensitive to these deviations does provide some pragmatic uses of Ppk whether it’s theoretically “valid” as an inferential for Cpk or not. These results would only have merit, or use, if the differences were compared against other Ppk results. In other words, if repeatability or trend improvement is the primary concern then Ppk is quite adequate, but if statistical accuracy is required Cpk will be the most “defendable.”
Ppk is less sensitive to deviations from these ideal conditions because it is based on a linear equation whereas Cpk is based on a secondorder equation.
The choice of which to use depends on how sensitive you wish to be to these nonideal conditions occurring in your data. If you prefer stable results where longterm trends are of more interest then Ppk is preferable. If you prefer quick notification of variation then Cpk may be preferable. If you choose Cpk and you have frequent incursions of variability then your Cpk may involve a large amount of noise or variation within itself.
In control applications of the chemical, refining, and pulp & paper industries, I prefer to use Ppk since it is more stable with respect to the multiple sources of variability to the process and the usually lengthy amount of time to put in place a corrective action. Particulary relevant is that infrequent, but “sharp” spikes in data (common in many control situations) will translate into very large changes in Cpk whereas Ppk will mitigate these to, what I consider, a more meaningful qualitative assessment of the control.
It may help to put forth some effects of nonideal conditions:
Nonnormal distributions: for a normal distribution, Cpk=1.0 implies that 0.27% of the product will be offspec. For a nonnormal distribution, Cpk=1.0 implies that the offspec product will be equal to the sum of the %below 3s and %above 3s. While the tails of the nonnormal distribution “could” sum to 0.27%, it is unlikely. Ppk uses the value of d2 for it’s calculation. d2 is only valid for normal distributions. So strictly speaking, Ppk should not be used for nonnormal distributions, but again there are useful applications for comparing Ppk values amongst themselves as long as you understand that you have departed from using Ppk to infer Cpk.
Having an insufficient size of data will impact both values adversely, but I believe the influence of the square root in the Cpk calculation for standard deviation would be more likely to require a larger data set to settle on a stable value.
Autocorrelated data. As long as the data set is large enough, autocorrelation should not significantly impact Cpk. However, autocorrelated data will have a severe impact on Ppk. If the data is autocorrelated the moving ranges will tend to be smaller thus understating the implied standard deviation and overstating the respective implied Cpk.
Data not in statistical control invalidates both Cpk and Ppk and neither result should be considered reliable.
Hope some of this helps. Any comments are more than welcome.
0June 4, 2003 at 8:52 pm #86671
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Some comments:
We seem to have inverted definitions of Cp and Pp. For me (and for most people in this forum) Cp uses Rbar/d2 and Pp uses the sqrt. (Ref: AIAG’s SPC manual). If swap Cpk for Ppk in your text, most thing look fine to me.
The estimation of sigma using the sqrt is pretty more efficient than using Rbar/d2. That mean that to have the same confidence interval width in the estimation you will requier a larger sample size when using Rbar/d2. If I understood correctly, this is the opposite from what you said.
You said “Nonnormal distributions: for a normal distribution, Cpk=1.0 implies that 0.27% of the product will be offspec. For a nonnormal distribution, Cpk=1.0 implies that the offspec product will be equal to the sum of the %below 3s and %above 3s” This is correct only if Cp=1.0 too. For example, if Cp=10.0 and Cpk=1.0 you will have only 0.135% offspec if normal. The other specification limit would be too far from the distribution as to have a part of the tail beyond it.0June 5, 2003 at 6:23 am #86686
HemanthParticipant@Hemanth Include @Hemanth in your post and this person will
be notified via email.Hi PrestonHmmm.. a VERY INTERESTING discussion. I got many new things to learn here..BUT if I were you I would go back and check with some AUTHENTIC literature on capability (I strongly recommend that). One such book is by Montegomery, which is mentioned in one of the messages..
Now, Its my turn to confuse (I am sorry for that..), but as I understand Ppk index is used when you are validating a newly developed process, something which you do by conducting PPAP (refer to a QS9000 manual)Cpk index is used when you have a continuous production process and you wish to know if your process is still capable of meeting your specs. I would suggest if you are conducting a PPAP then use Ppk index and when you have a continuous process use a Cpk index for regular monitoring and dont monitor them simultaneously, it will only confuse you.0December 16, 2003 at 8:04 am #93599
ChandrasekarParticipant@Chandrasekar Include @Chandrasekar in your post and this person will
be notified via email.Hello sir,
Cpk is the Long term study.PpK is the Short term study.
To take 25 nos to 50 nos=PpK
To take one month of data=CpK (Please Correct your message)
Hello sir only one doubt: Why perticularly 6 uses the Pp value formula?
Thanks&Regards
Chandrasekar0December 16, 2003 at 12:54 pm #93610Use Cpk always!
Cpk tells you what is happening NOW!
Ppk only give you a history lesson in a dynamic ever changing process data from a year ago is of little value.0December 16, 2003 at 4:03 pm #93620
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Nonsense
0December 16, 2003 at 4:47 pm #93622Ron,
Sigma longterm is what the customer ultimately sees (ppk), so I wouldn’t necessarily throw it out. cpk, while very useful, is a snapshot in time of your process, and is used ONLY for the case where the process is in a state of statistical process control (all variation can only be explained by common causes).
Mark0December 17, 2003 at 7:17 pm #93663
Rick PastorMember@RickPastor Include @RickPastor in your post and this person will
be notified via email.Kim:
I am not sure why Montgomery would say that Ppk is a step backward. I have several of his books but not the one in question. I would love to know how Montgomery defines Cpk and Ppk and to see the section in question.
The fact is, if you calculate Cpk and Ppk using the same set of data, the two values should be very close. The difference in the two lies in how you calculate the standard deviation. In what I call the old days, before everyone had computers, the standard deviation was calculated using a short cut — R_bar/d2. With computers there is no need to use a short cut to calculate the standard deviations.
Short‑Term vs Long‑Term Capability: Here is where, I think, the confusion even gets worse. Since the same set of data gives nearly the same value for Cpk and Ppk; we have to ask what does short‑term and long‑term capability really mean.
Short‑Term: Examines the data over a short period of time. For example, last weeks data or for a high volume process it maybe the data collected for a single day.
Long‑Term: Examines the data over a longer period of time, say a month or a year.
The longerterm capability takes into account the drift in the mean. In contrast, the short term is a snap shot of the process. Clearly, a plot of short‑term capability as a function of time shows how the capability changed over time. The long‑term capability summarizes the change. It is how you use the data to improve the process that is really important.0December 18, 2003 at 8:25 pm #93695
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.No. The difference between the 2 methods of estimating the standard deviation is that Rbar/d2 is insensible to shifts in the mean beacues it is based on the ranges alone.
You said: “The fact is, if you calculate Cpk and Ppk using the same set of data, the two values should be very close.” This is true only if there are no shifts in the mean. Have a look:
Dataset #1Rbar/d2
10.38
10.26
11.36
9.80
11.13
9.22
11.19
9.40
9.14
8.701.15
9.34
11.18
11.93
9.00
10.80
10.29
10.31
8.90
9.79
9.669.82
9.42
8.01
10.55
10.41
9.60
10.96
9.32
9.07
8.02S
10.00
10.60
11.54
10.90
12.86
11.44
7.77
10.42
12.21
10.941.11
10.75
11.56
9.90
9.51
9.23
12.05
11.01
9.69
9.78
9.40UCL(Xbar)
11.7210.60
10.5510.88
10.52
10.25Xbarbar
10.1710.06
9.95
9.55
10.00
9.34LCL(Xbar)
8.62UCL(R)
5.753.92
3.63
2.84
3.423.14
2.92Rbar
2.681.41
2.131.89
1.52
LCL(R)
0.00Dataset #2
Rbar/d2
9.38
9.26
12.36
8.80
11.63
10.22
12.19
8.40
9.14
8.201.15
8.34
10.18
12.93
8.00
11.30
11.29
11.31
7.90
9.79
9.168.82
8.42
9.01
9.55
10.91
10.60
11.96
8.32
9.07
7.52S
9.00
9.60
12.54
9.90
13.36
12.44
8.77
9.42
12.21
10.441.56
9.75
10.56
10.90
8.51
9.73
13.05
12.01
8.69
9.78
8.90UCL(Xbar)
11.6211.55
11.38
11.52
11.25Xbarbar
10.079.06
9.608.95
8.55
10.00
8.84LCL(Xbar)
8.52UCL(R)
5.753.92
3.63
2.84
3.423.14
2.92Rbar
2.681.41
2.131.89
1.52
LCL(R)
0.00Each dataset has 10 subgroups of size 5 (total 50 data). Below the individual values you can see the values of Xbar togetehr with Xbarbar and the control limits. And below the Xbar data you can see the values of R together with Rbar and the control limits. Note that the Xbarbar and the control limits for Xbar of both sets are about the same
0December 19, 2003 at 2:05 pm #93706
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Somehow the last paragrhaph was truncated when posting… Here is the full version:
Each dataset has 10 subgroups of size 5 (total 50 data). Below the individual values you can see the values of Xbar togetehr with Xbarbar and the control limits. And below the Xbar data you can see the values of R together with Rbar and the control limits. Note that the Xbarbar and the control limits for Xbar of both sets are about the same. The values of R are identical in both sets, so Rbar and the cotnrol limits for R are also identical. In the set #1, the values of Rbar/d2 and S (the sample standard dviation of the 50 individual values all together) are very close. Not only that, S is even slightly smaller than Rbar/d2 (something that many people think and say tha it’s impossible). In the set #2, Rbar/d2 is identical to the same value in the set#1 (it must be, Rbar was identical), but S is 36% larger. Finally, note that all values of Xbar and R are within control limits in both sets.
So your statement “if you calculate Cpk and Ppk using the same set of data, the two values should be very close” is true in one case, but not in the other. Do you know why?0December 19, 2003 at 8:56 pm #93721Hi Gabriel,
Is it because process B has a 1.5 sigma mean shift? J
Just to add to your point. The variation calculated within the subgroups for Cpk is an estimate of the inherent, or entitlement level of variation. The variation calculated as the RMS of all the data is an estimate of the total variation including any temporal, cyclical, or transient special cause variation. There is risk in using the Ppk as a prediction of future process performance, particularly if the special causes are primarily of the random transient variety. Even though the Ppk will take in to account the drifts or shifts over the period of data collection, there is no way to predict that the pattern of special causes will exist in the future. I think this is why Montgomery gets frustrated over the use of Ppk.
One good use of these metrics is in components of variation context. If I perform a COV on your data:
Process A:
Variance Components
Source Var Comp. % of Total StDev
Between 0.014* 0.00 0.000
Within 1.256 100.00 1.121
Total 1.256 1.121
Process B :
Variance Components
% of
Source Var Comp. Total StDev
Between 1.274 50.37 1.129
Within 1.256 49.63 1.121
Total 2.530 1.591
You can see that Process A is performing at its level of entitlement. Where process B sees a 50.37% of its total variation contribution from process management (for lack of a better term) with the same potential performance level (entitlement) as process A.
The proper use of Ppk should be to assess the amount of improvement that can be made through DMAIC projects vs the need for process redesign. Makes more since than adding a 1.5 shift to process A.
Cheers,
Statman0December 21, 2003 at 9:20 pm #93735
Rick PastorMember@RickPastor Include @RickPastor in your post and this person will
be notified via email.You, Gabriel, have provided two excellent sets of data. The two sets of data show why estimating sigma from R_bar/d2 is so dangerous (d2=2.326 for a subgroup of 5). Estimating s from R_bar/d2 assumes that the distribution is normal. The following will show that it is better to use the definition of standard deviation, the formula, to calculate s.
Before I go through what I have seen in the data, let me state that I used both JMP and excel for my calculations. Furthermore, both data sets were examined in two ways: A distribution analysis, and a control chart. Furthermore, I make the arbitrary assumption that the USL and LSL equal 12.4 and 7.4 respectively.
Data Set 1: The data is normally distributed and Cpk » Ppk.
Distribution analysis:
1. 10 subgroup averages: Goodness of fit pvalue = 0.8061 (JMP)
2. 50 individual points: Goodness of fit pvalue = 0.8598 (JMP)
3. Summary: The data is normally distributed.
Control Chart:
4. Subgroup size of 5 and applied Shewarts eight (8) tests: The data indicates that the process is in control. The data passed all eight tests. (JMP)
Excel Calculations:
5. Cpk= 0.644 (Excel)
6. Ppk= 0.668 (Excel) (agrees with the value calculated in JMP)
7. Summary: Cpk » Ppk
Data Set 2: The data is not normally distributed and Cpk > Ppk.
Distribution analysis:
1. 10 subgroup averages: Goodness of fit pvalue = 0.0601 (JMP)
2. 50 individual points: Goodness of fit pvalue = 0.0098 (JMP)
3. Summary: The data is not normally distributed
Control Chart:
4. Subgroup size of 5 and applied Shewarts eight (8) tests: The data indicates a process that is out of control. (Two out of three points in a row in Zone A or beyond detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.) (JMP)
Excel Calculations:
5. Cpk= 0.673 (Excel)
6. Ppk= 0.499 (Excel) (agrees with the value calculated in JMP)
7. Summary: the process in not stable Cpk > Ppk
When a process in not stable, the data is not normally distributed, the value of Cpk can be larger than Ppk. What this means is the estimate of s does not pick up the shift in process mean. In contrast, the calculated value of s captures the variability of process and gives a smaller value for Ppk.
Gabriel, I stand corrected for failing to reveal the assumptions to my statement. Cpk and Ppk are roughly equal when the same set of data is used and the data is normally distributed. In other words, a process that is stable, normally distributed data, gives Cpk and Ppk that are roughly equal. Therefore, Cpk for a process that is not stable can make the process appear better than it actually is. That is why reported values of Cpk or Ppk should include the calculation method.
Because many managers and engineers are barely familiar with Cpk and even less familiar with Ppk, I calculate Ppk while I use the word Cpk in discussions.
0December 22, 2003 at 4:02 pm #93757
Rick PastorMember@RickPastor Include @RickPastor in your post and this person will
be notified via email.Statman:
I confess that I am a litte confused by the statement that it is better than adding a 1.5sigma shift to process A. Nevertheless, I would say that Montgomery and I would have a different point of frustration (I would love to see the exact pages where Montgomery discuss the issue). When I examine the difference between Cpk and Ppk, I just see that Cpk uses a short cut to calculate sigma. That short cut requires the data to be normally distributed.0December 22, 2003 at 8:35 pm #93773Hi Rick,
The 1.5 comment was my feeble attempt at sarcasm.
I have not read the Montgomerys discussion of the issue. I am only assuming based on my experience why he would have an issue with how Pp can be misused.
I understand where you are coming from on the issue of normality with Gabriels distribution B. However, I think that there needs to be clarity in that you are talking about the shape of the histogram of all the data points, not that the data is normal. If you subtract Gabriels process A data from his process B data you will find that Gabriel has simply taken a random normal data series (Process A) and added constants to the subgroups. So the underlying distribution in each process is normal but process B is not stable due to shifting the subgroups in various directions to various degrees. Therefore, Process A is a random normal and stable process where process B is a process that is inherently normal but not stable. In other words, process B has subgroups with normally distributed data but the averages of the subgroups do not differ due to only random normal variation.
There is a risk in using the histogram of all the data and subjecting it to a normality test. A process that is not stable can have a normal shape in the histogram of all the data. A process that is stable can have a nonnormal shape in the histogram. This is because you can have any of the four situations: Stable/Normal, Stable/Nonnormal, Not stable/normal, Not stable/nonnormal.
You are right that there is a risk of prediction future capability of a process based solely on the Cpk as the Cpk assumes a stable process. However, both Cpk and Pp assume normality when used to predict the nonconformance rate relative to the specs. There is no advantage (from a normality perspective) in one metric over the other.
As I showed in my last post, my preference is to use a COV approach to establish the percent contribution to the process due to process management and to get an estimate of the entitlement level of capability. If, as in the case of Gabriels data, the process can be assumed inherently normal, then the Cp will indicate the best possible performance of the process and the predicted nonconformance rate at that level.
Statman0December 22, 2003 at 9:51 pm #93778
Statman TooMember@StatmanToo Include @StatmanToo in your post and this person will
be notified via email.Based on the shortterm standard deviation, we compute
Cp and Cpk. Using the longterm standard deviation, we
compute Pp and Ppk. Using Rbar/d2 is a rather pointless
exercise given the proliferation of computers. Besides,
one should use d2* from the MilStds and not d2, for
reasons one can easily source trace through the
literature. Rather than recapitulate what has already
been written, I am providing the following from the Dr.
Harry Q&A forum: The capability of a process has two
distinct but interrelated dimensions. First, there is short
term capability, or simply Z.st. Second, we have the
dimension longterm capability, or just Z.lt. The short
term (instantaneous) form of Z is given as Z.st = SL T /
S.st, where SL is the specification limit, T is the nominal
specification and S.st is the shortterm standard deviation.
The shortterm standard deviation would be computed as
S.st = sqrt[SS.w / g(n 1)], where SS.w is the sumsof
squares due to variation occurring within subgroups, g is
the number of subgroups, and n is the number of
observations within a subgroup. It should be fairly
apparent that Z.st assesses the ability of a process to
repeat (or otherwise replicate) any given performance
condition, at any arbitrary moment in time. Owing to the
merits of a rational sampling strategy and given that SS.w
captures only momentary influences of a transient and
random nature, we are compelled to recognize that Z.st is
a measure of instantaneous reproducibility. In other
words, the sampling strategy must be designed such that
Z.st does not capture or otherwise reflect temporal
influences (time related sources of error). The metric Z.st
must echo only pure error (random influences). Now
considering Z.lt, we understand that this metric is
intended to expose how well the process can replicate a
given performance condition over many cycles of the
process. In its purest form, Z.lt is intended to capture and
pool all of the observed instantaneous effects as well as
the longitudinal influences. Thus, we compute Z.lt = SL
M / S.lt, where SL is the specification limit, M is the mean
(average) and S.lt is the longterm standard deviation.
The longterm standard deviation is given as S.lt =
sqrt[SS.t / (ng 1)], where SS.t is the total sumsof
squares. In this context, SS.t captures two sources of
variation errors that occur within subgroups (SS.w) as
well as those that are created between subgroups (SS.b).
Given the absence of covariance, we are able to compute
the quantity SS.t = SS.b + SS.w. In this context, we see
that Z.lt provides a global sense of capability, not just a
slice in time snapshot. Consequently, we recognize that
Z.lt is timesensitive, whereas Z.st is relatively
independent of time. From Dr. Harrys definition, we
have Cp = SL T / S.st, Cpk = Cp*(1(MT/SLT), Pp = 
SL T / S.st, and Ppk= Pp*(1(MT/SLT). From these
equations, it would seem that Cp and Cpk are shortterm
measures of capability, while Pp and Ppk and longterm
measures. These same statistics and indices of capability
are embedded in the Minitab software under the tool
header Six Sigma and spelled out in the help menu. By
what I can see, Cpk captures static but momentary shifts
of a transient nature, while dynamic shifts and drifts is
captured within the Ppk calculation.
Hope this helps
Statman Too0December 23, 2003 at 4:40 pm #93802
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Rick:
“When a process in not stable, the data is not normally distributed”
That’s wrong again. I’ve prepared the following dataset:
Dataset #3Rbar/d2
10.83
11.13
10.00
8.38
11.08
8.56
10.52
10.16
9.11
5.661.15
9.79
12.05
10.57
7.58
10.75
9.63
9.64
9.66
9.76
6.6210.27
10.29
6.65
9.13
10.36
8.94
10.29
10.08
9.04
4.98S
10.45
11.47
10.18
9.48
12.81
10.78
7.10
11.18
12.18
7.901.71
11.20
12.43
8.54
8.09
9.18
11.39
10.34
10.45
9.75
6.3611.48
UCL(Xbar)
11.2010.51
10.83
9.8610.31
9.97Xbarbar
9.669.19
8.539.58
LCL(Xbar)
8.116.30
UCL(R)
5.753.92
3.63
2.84
3.423.14
2.92Rbar
2.681.41
2.131.89
1.52
LCL(R)
0Please make exactly the same analysis you performed for the datasets #1 and #2 and tell me what you find.
The dangerous thing is not to use one indicator or the other. The dangerous thing is not knowing what the indicator you are using means.
The lack of normality might be a cause of some minor difference between Rbar/d2 and S, because d2 is a constant defined for the normal distribution. However, there is something much more important in the concepts of Rbar/d2 and S that make S to be greater than Rbar/d2 regardless the shape of the distribution, as this example shows (the distribution of the set #3 has a shape that is pretty close to that of the set #1). Do you know why?
0January 5, 2004 at 3:37 am #93948
Rick PastorMember@RickPastor Include @RickPastor in your post and this person will
be notified via email.I have made the same calculation for all three sets of data: The results are below: In JMP, the distributions (all 50 points) for sets 1, 2, and 3 appear different.
Now, I have a question for you. What rules do you use to determine if a set of data represents a stable proces?
Data Set 1: The data is normally distributed and Cpk » Ppk.
Distribution analysis:
1. 10 subgroup averages: Goodness of fit pvalue = 0.8061 (JMP)
2. 50 individual points: Goodness of fit pvalue = 0.8598 (JMP)
3. Summary: The data is normally distributed.
Control Chart:
4. Subgroup size of 5 and applied Shewarts eight (8) tests: The data indicates that the process is in control. The data passed all eight tests. (JMP)
Excel Calculations:
5. Cpk= 0.644 (Excel)
6. Ppk= 0.668 (Excel) (agrees with the value calculated in JMP)
7. Summary: Cpk » Ppk
Data Set 2: The data is not normally distributed and Cpk > Ppk.
Distribution analysis:
1. 10 subgroup averages: Goodness of fit pvalue = 0.0601 (JMP)
2. 50 individual points: Goodness of fit pvalue = 0.0098 (JMP)
3. Summary: The data is not normally distributed
Control Chart:
4. Subgroup size of 5 and applied Shewarts eight (8) tests: The data indicates a process that is out of control. (Two out of three points in a row in Zone A or beyond detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.) (JMP)
Excel Calculations:
5. Cpk= 0.673 (Excel)
6. Ppk= 0.499 (Excel) (agrees with the value calculated in JMP)
7. Summary: the process in not stable Cpk > Ppk
Data Set 3: The data is not normally distributed and Cpk > Ppk.
Distribution analysis:
1. 10 subgroup averages: Goodness of fit pvalue = 0.1890 (JMP)
2. 50 individual points: Goodness of fit pvalue = 0.0492 (JMP)
3. Summary: The data is normally distributed when viewed as subgroups (the central limit theorem). On the other hand, the 50 point distributions indicates that the data is not normally distributed.
Control Chart:
4. Subgroup size of 5 and applied Shewarts eight (8) tests: The data indicates that the process is not control. Subgroup 1 and 10 (One point beyond Zone A detects a shift in the mean, an increase in the standard deviation, or a single aberration in the process. For interpreting Test 1, the R chart can be used to rule out increases in variation. ) (JMP)
Excel Calculations:
5. Cpk= 0.711 (Excel)
6. Ppk= 0.432 (Excel) (agrees with the value calculated in JMP)
7. Summary: Cpk > Ppk
0January 5, 2004 at 4:18 pm #93958
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Hi Rick,
Ok, my 3rd example have failed.
About your question to me, I use the WECo rules.
But what I wanted to point out was not about how stability was determined or if the datasets of my examples were stable or not stable. Here is the story:
You said:
“The fact is, if you calculate Cpk and Ppk using the same set of data, the two values should be very close. The difference in the two lies in how you calculate the standard deviation. In what I call the old days, before everyone had computers, the standard deviation was calculated using a short cut — R_bar/d2. With computers there is no need to use a short cut to calculate the standard deviations.”
This is all wrong, from my point of view.
The two first examples shown that, even when coming from the same dataset, Cpk and Ppk (or Rbar/d2 and S) can be not “very close”, but pretty far. At the end of these examples I asked if you knew why those values were close in one xample and not in the other. In your answer, you said things such as:
“The two sets of data show why estimating sigma from R_bar/d2 is so dangerous (d2=2.326 for a subgroup of 5). Estimating s from R_bar/d2 assumes that the distribution is normal” … “When a process in not stable, the data is not normally distributed, the value of Cpk can be larger than Ppk” … “Gabriel, I stand corrected for failing to reveal the assumptions to my statement. Cpk and Ppk are roughly equal when the same set of data is used and the data is normally distributed.”
I think that many of these things are wrong too, and the other tgings are missleading given the context.
So I provided a third example that was supposed to have different values of Cpk and Ppk (or Rbar/d2 and S) and still be normally distributed (succeded in thi first and failed in the second).
So I said what is wrong for me, but not why and what is correct, and how I created the examples. Here it goes (if you want a summary, go to the end of the post).
All 3 examples were created in Excel.
The first was just a set of random values from a normal distribution (µ=10, sigma=1), which was arenged in subgroups of size 5. So it is fully random (i.e. no special causes) and, pass a normality test or not, they do belong to a normal distribution (of courses the chances are that they would pass, as they did). But normality was not intended to be an issue in this example.
The second example was exactly the same values and in the same possition than the example one, exept that I added a constant to each subgroup. For example, the first subgroup example 1 is 10.38, 9.34, 9.82, 10.00 and 10.75., while in the example 2 the same subgroup was 9.38, 8.34, 8.82, 9.00 and 9.75. As you can see, the constant added in this case is 1. The constants (mostly 1, 0 or 1) were added “by hand” to each subgroup trying to keep the average close to that of the first example, and keeping the points within control limits. I know that that this does not mean that the process is stable. In this case, the process is clearly not stable since in each subgroup there is one special cause of variation (the constant that I added). Again, I haven’t even thought of “normality” in this example.
Why in the first example Rbar/d2 and S are so close but not in the second? Because, in the first example, the values show only random variation, but in the second example they show not random variation. Not because the first vales are normally distributed and the second values are not. It is true that the values d2 are derived for a normal distribution, but using Rbar/d2 in a notnormal distribution is a second order error.
The key thing is that, in both examples, all ranges are identical (because adding a constant to all the values of one subgroup does not changes its range), and then Rbar/d2 is identical. Rbar/d2 is the estimation of the “within subgroup” variation (also called “internal”, “inherent”, “short term”), and the within subgroup variation is identical in both examples. There is allways “between subgroup variation” (i.e. even if the process was perfectly stable not all averages will be equal). In general, part of the “between subgroup” variation can be explained due the “within subgroup” variation (i.e. the subgroups average are not equal just by chance). In the first example, all the between subgroup variation falls in this field, since chance was the only factor involved. Then, estimation of the “total” variation S (that includes both within subgroup and between subgroup variations) is very close to the estimation of the “within subgroup” variation Rbar/d2. In the second example, however, the “between subgroup” variation was way beyoind what could be explained due the “within subgroup” variation, because there was more “between subgroup” variation “added by hand” by me, when I added a different constat to each subgroup. And that’s why, in the second example, the total variation S is much greater than the within subgroup variatin Rbar/d2. Normality or lack of normality is a second orther factor here.
Shortly, when all (or most) variation is due to common causes only, Rbar/d2 and S should be pretty close. When a significant part of the total variation is due to special causes, Rbar/d2 will fall short of the total variation S.
However, this does not make Rbar/d2 “dangerous”, or “useless”, or “old fashioned”. In fact, Rbar/d2 will tell you what you could get if you identified and eliminated the special causes of variation. If you need further improvement beyond Rbar/d2, that will mnot be achived by using the control chart to identify unstability, because the stable process will not be satisfactorily capable.
After you blaming the lack of normality for the disagreement between Cpk and Ppk I made the third example. Again, it was exactly the same values and in the same position than in the example 1, but with a constant added to the values of aeach subgroup. The difference was that, in this case, the constant were not arbitrarily choosen by me. Instead, I took 10 random values from another normal distribution (this one with average 0 and SD I don’t remember) ans added each one of these values to the values of each subgroup. The idea was that the shift of the mean of the subgroups would be normally distributed. Well, seems it failed. But the concept is still there. A set of data can look normal when all values are put together in a histogram or in a normality test. But the process can still be unstable when you plot the data over time. And then Rbar/d2 will be smaller than S, even when the data pass a normality test. I guess that they were too few subgroups and the “special” variation was too large, leaving “holes” in the histogram that made it look not normal. But the distribution where these values were taken from IS normal (it is the sum of two normal distirbutions).
Summary:
– Rbar/d2 and S are two estimators of two different types of variation: Repsectively, “within subgroup” variation and “total variation”.
– When there is significantly more “between subgroup” variation than what can be explained due to the “within subgtroup” variation, then the “total” variation and its estimator S will be clearly larger than the “within subgroup” variation and its estimator Rbar/d2. Such a process is unstable. If the process is stable (all variation is random) then the total variation equals the within subgroup variation, and their estimators are close.
– The gap between these two types of variations gives you an idea of the magnitude of the variation due to special causes included in the process. Sch variation due to special cuases can sometimes pass undetected in the control chart, so it is important to caompare the two values. Rbar/d2 (or Cpk) is the best you can get from S (or Ppk) by removing the special causes of variation.
– Stability and Normality are two different and independent concepts. A dataset may be stable or not and, independently, show stability or not. In fact, if a process is not normally distributed, it NEEDS to keep not normally distributed to be stable (stability = equality of behaviour over time)
– It is not true that Rbar/d2 is used instead of S because it was easier to calculate before IT was here. As said before, they are estimators of different types (or dimensions) of the process variation.
– Instead of Rbar/d2, one could use Sbar/c4 as an estimator of the within subgroup variation. Rbar/d2 is easier to calculate without the aid of specific software, but Sbar/c4 is more efficient. In this case, it can be said that Rbar/d2 is more popular because it was easier to calculate when such softwares were not available. However, it must be understood that the loss of efficiency of Rbar/d2 over Sbar/c4 is less than 10% for subgroups of size up to 10.0January 6, 2004 at 9:38 am #93973Hi
Let me put up an actual case I came across recently, which is consistent with Gabriel’s third simulation listed below.
It concerns a production line that fills bottles of detergent. Every hour the supervisor weighed a subgroup of four bottles for control charting. The variation within each subgroup was very small.
However the bottle filling machine also had longer term fluctuations (over many hours) also conforming to a normal distribution. These arose from environmental changes, the pressure of the compressed air and so on. In any case the company didn’t think it worth trying to fix it.
The supervisor was having difficulty using control charts. The limits were based on the subgroup range and so points were frequently exceeding the limits on the mean chart because of the long term fluctuations. This led to overadjusting the process (this was apparent from the data).
What were the options?
0January 6, 2004 at 3:19 pm #93980Glen,
There are a few options that I see. One is to not use a subgroup size of 4. Since the 4 readings have been shown to be very similar, the range will be small, thus the control limits small, thus many samples are outofcontrol, thus process adjustments are made which leads to overcontrol. A sample size of 1 might provide enough good information to help you control, run, and improve your process.
Another option: instead of taking 4 consecutive readings, space out the time between each of the 4 readings. (This will likely cause the range to increase, and thus the control limits will increase.) The chance of an outofcontrol condition will be reduced. Process adjustments will be made, but only when it is shown to be outofcontrol.
Another option: Use ThreeWay control charts. This is discussed in Donald Wheelers text, Understanding Statistical Process Control, Second edition.
Another option: Take your samples, but dont make any adjustments to the process. This wont be popular to many readers of this forum, but you will discover something about the process. It sounds like there is a lot of overcontrol, based on signals (outofcontrol conditions), that leads to greater problems.
Matt0February 23, 2004 at 3:07 pm #95938
rich barberMember@richbarber Include @richbarber in your post and this person will
be notified via email.I’m trying to calculate the our process capabilities. X bar, X double bar, Range, Average Range, Cp and cpk. If you could email me, I could send you the data I have collected onto an excel spreadsheet.
0February 23, 2004 at 6:46 pm #95957Rich,
There are other much more qualified people on this forum than me, but if you want me to I will try and help.
[email protected]0September 30, 2004 at 3:27 am #108246
vpschroederMember@vpschroeder Include @vpschroeder in your post and this person will
be notified via email.Hi Praneet
What is your opinion on reporting cpk or ppk for chemical processes in which nominal values are maintained manually through chemical additions? We sample chemical baths twice daily and plot the results on an individuals chart. Additions are made to the baths to bring them close to their nominal values. The additions are calculated from a formula that takes into account the amount of product exposed to the chemicals.
The internal argument centers around whether we can call this a “random process” in the first place, and then it gets complicated after that when we attempt to cite Cpk values derived from the data..
what is your thought??
0September 30, 2004 at 3:48 am #108253I have the same concern. The data points are not obviously independent.
Things get more complicated with the frequency iming of measurement: Before chemical topup, After chemical topup, random ???
Appreciate if anyone can provide his opinion.0September 30, 2004 at 6:36 am #108259Migs,
There are many process that have to be adjusted from run to run, such as ink mixing and electrochemical processes. These process do not satisfy the requirements of Shewhart Charts, although trend charts can be useful.
For electrochemical processes, a more important concern is the ‘state’ of the process, such as normality, and activity, and the amount of ‘carryover.’ Since mass spectroscopy systems and NIR systems are relatively inexpensive these days, these can be used to monitor the amount of contamination and the best way of doing this is to use multivariate data analysis to study the spectra (PCR.) Camo software provides chemometric software. (No assocation although I have attended their course in Norway.)
A good approach was taken by a colleague who developed a VB software program to record all the monitored data and to act as a kind of ‘process supervisor.’ In other words, it examines the data and recommends what should be added and the amount. The software also provides ‘trend charts’ and other diagnostic tools. The software then recommends a corrective action and what other tests should be performed. This approach ‘transformed’ the process! Most work these days is to try to reduce variation across the load by changing the chemistry, jigs, etc.
Andy0March 30, 2005 at 10:11 am #116984
Taufik SubagjaMember@TaufikSubagja Include @TaufikSubagja in your post and this person will
be notified via email.What is Cpk Stands for ? ( Acronim of cpk )
is there anyone know what is K stands for ?0May 11, 2005 at 12:44 am #119306
Don LaDueParticipant@DonLaDue Include @DonLaDue in your post and this person will
be notified via email.I’m no expert and have forgotten quite a bit because I don’t get to use my statistics knowledge a whole lot, but I understand that the cpk acronym is taken from the Spanish words for “Process capability Index”, hence the “strange” (to english speakers) order of the letters, and the “k”.
Does anybody else know for sure?0May 11, 2005 at 6:29 am #119309k stands for “katayori” it means bias …
Andy0January 4, 2006 at 9:35 pm #131902
BOISRONDParticipant@BOISROND Include @BOISROND in your post and this person will
be notified via email.Hello Chantal,
Do you undestand French,
Si oui pouvezvous me confirmer les formules de Cp et Cpk.
Cp = capabilité exprimant la dispersion d’échantillon
Cp = capabilité exprimant à la fois la dispersion et le centrage.
Merci
Pascal
0January 5, 2006 at 10:21 am #131918When I translate French to English,
Cpk indicates how capable the process to produce consistent and stable output that meet customer requirements and expectations when the process is not centered, but falls on or between specifications.Ppk does the same job but in longterm base.
Hope it helps.
Eddie0January 5, 2006 at 10:34 am #131919
Flying Sigma CircusParticipant@FlyingSigmaCircus Include @FlyingSigmaCircus in your post and this person will
be notified via email.“but falls on or between specifications.”
Not a true statement. Research the cause of negative Cpk values.0January 6, 2006 at 1:40 am #131965My statement is true, but incomplete. It should be:
“but falls on, between specifications, or the distance away from the specifications.” This definition gives meaning to both +ve and ve Cpk.
Agree?0December 28, 2006 at 4:55 pm #149600
Mike RobinsonParticipant@MikeRobinson Include @MikeRobinson in your post and this person will
be notified via email.We use a chemical software program to maintain our baths. We perform periodic tests, daily, weekly, monthly, etc. to determine chemical adds/dilutes. The software allows us to calculate CPK values for each tank.
Our issue is whether to make adds/dilutes to meet the chemical manufacturer’s min/max values or to make adds/dilutes to maintain a CPK =>1.33.
0April 24, 2007 at 4:08 pm #155188
manmath shankareParticipant@manmathshankare Include @manmathshankare in your post and this person will
be notified via email.Dear all,
Mein diffrence betn cpk & ppk is that while considering assinable & comman cause calculated index is preliminary capablity index & only considering comman or chance or random cause calculated index is capablity index. it is being calculated when process is routine.
Bothe sigma calculation forula is diffrence for ppk is sigma s and for cpk only sigma
i tried to tell of my optimum level.
Best Regards,
Manmath.0August 7, 2007 at 1:12 pm #159643
Robert TiptonMember@RobertTipton Include @RobertTipton in your post and this person will
be notified via email.CPk stands for …
Capability Index
“CP” for Capability of the process
Small “k” is used mathematically as an index number.
Proper use is “CPk”, not Cpk, not cpk, not CPK.
Method to quickly understand if the process is acceptable or not.
0August 7, 2007 at 2:19 pm #159645
The New MBMember@TheNewMB Include @TheNewMB in your post and this person will
be notified via email.Are you sure?
0August 7, 2007 at 2:27 pm #159646
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.It’s my understanding that “pk” is supposed to stand for “Process” and K is for the Japanese word “katayori” which means deviation or offset. There was a nice thread on this issue on the forum awhile back. The lead post is the following:
https://www.isixsigma.com/forum/showmessage.asp?messageID=473020August 7, 2007 at 2:30 pm #159647
Robert TiptonMember@RobertTipton Include @RobertTipton in your post and this person will
be notified via email.Yes, I am sure.
PPk is an index (k) number for the Process Potential (PP) study.
Typically, this is accomplished via a 30 to 100 piece, short term capability study.
The most popular sample size is a 50 piece study, for Process Potential capability studies.
The formula is really quite simple PPk = ZMinimum divided by 3.
During the early stages of implementing the CPk concept, Fords acceptance criteria was +/ 3 sigma, or 99.73% good parts. In general, Larry Sullivan (Ford) is given credit for inventing the CPk concepts. Mr. Sullivan had some perception that the ZValues and ZTables would be too complicated to widely introduce, along with statistics at Ford. At that time a Cost of Living Index was being used at Ford. This was a simple formula, if the index was higher than 1,employees got more money. Larry thought about this simple concept of more than 1 being good. So Larry conceived the idea of dividing the minimum
ZValue by 3. Since the acceptance criteria at Ford was +/ 3 sigma. If the worst case (ZMinimum) was divided by 3, and the CPK value was greater than 1, all is good. This could be simple to train people, less than 1 is bad and more than 1 is good. As they say, the rest is history. Since that time Ford has increased the acceptance criteria to +/ 4 sigma (99.994%), so the minimum acceptance criteria for CPk is now 1.33. Again, the formula is easy +/4 sigma divided by 3 = 1.33 CPk, minimum.
Note: Larry Sullivan went on to establish the American Supplier Institute, a fantastic training organization.
During the early 1980s General Motors was using Machine Capability Ratios (MCR) and Process Capability Ratios (PCR). Most people associated using the CPk with Ford Motor Company. There were some errors in using the General Motors formulas, primarily in the calculation of unilateral tolerances. Since we did not think G.M. would want to implement the Ford CPk standards, we corrected the G.M. formulas so they could keep their MCR and PCR identity. However, to my surprise, G.M., and now the world has incorporated the CPk as the standard. It is so much simpler to have one standard to work with, we should all be happy for that.
0August 8, 2007 at 2:32 pm #159686Mr Tipton,You may be correct about the origins and the original notations,
but you are way out of date on both the formulas and notations as
they have been used at least since the AIAG standards have been in
place. Some differences Cp, not CP
Cpk, not CPk
Pp, not PP
Ppk, not PPk
Cp and Cpk are now short term (I know you were correct about Pp
and Ppk once being short term in Ford dogma)
Pp and Ppk are now long term.Reference page 80 of the AIAG SPC book where these terms are
defined.A sample size for a short term study is also much bigger than you
stated.0August 13, 2007 at 6:11 pm #159889
Robert TiptonMember@RobertTipton Include @RobertTipton in your post and this person will
be notified via email.I would like to make a couple of clarification points, regarding my previous EMail and the subject matter.
The original abbreviation and definition of “CPk” was correctly stated. “CPk” stood for Capability Process (CP) as an index number (k). CPk was abbreviated as I have shown it.
Over time this has changed to many versions, such as; cpk, Cpk, CPK. The AIAG SPC training manual shows typically shows is as “Cpk”. Things do tend to change over time. Based upon the original meaning, I will continue to use CPk. However, it is a personal choice, based upon history.
PPk follows this line of thought. PPk originally stood for Process Potential study, or Preliminary Process study.
The intention was to show a higher expectation for short term capability studies in an effort to better predict long term studies.
Typically, the default value for PPk is a minimum of 1.67 and a minimum of 1.33 for CPk.
The reason for this is that there will most assuredly be more variation over time, than will be discovered in a short term study.
Therefore PPk should be greater than 1.67 and CPk greater than 1.33 CPk. This really equals one standard deviation of difference.
If you have a +/ 5 Sigma for a short term study, you will attain an approximate +/ 4 Sigma for a long term study.
This was the original intention of PPk, and CPk and still is a logical path today.
In conclusion, short term studies should be using PPk and long term studies should be using CPk. Definition of short term and long term has some variation in the sample size, and length of time.
I hope this helps other people to effectively use these tools.
Best Regards,
Robert Tipton
0August 14, 2007 at 12:04 am #159892
Jim AceParticipant@JimAce Include @JimAce in your post and this person will
be notified via email.Robert Tipton, your recent post concerning process capability ratios was nicely written, but there is an arguable error contained within your description. Your post stated that: In conclusion, short term studies should be using PPk and long term studies should be using CPk.
I believe the generally accepted interpretation of capability ratios within the larger community of six sigma practitioners can be generally characterized by the following descriptions.
1. Cp and Cpk represent short term capability, also called within group capability. These two capbility metrics rely soley on Rbar/d2 or the within group sums of squares to calculate the standard deviation (See Minitab Help manual). Calculating in this way assures that the between group variations are excluded from influencing the size of the standard deviation. In this way the resulting capability metrics (Cp and Cpk) represent the best possible condition of the process.
Cp = (USL – LSL)/6S.within
Cpk = Cp(1k)
2. Pp and Ppk represent long term capability, also called overall capability. These two capbility metrics rely on the total sums of squares, not just the within group sums of squares. This mean the within group AND between group sums of squares are used to estimate the standard deviation. In this way the overall standard deviation (that is, the long term standard deviation) includes all types of variations that occured over the total period of sampling, not just those variations that occured within groups. In this way the resulting capability metrics (Pp and Ppk) represent the actual capability of the process because random causes and assignable causes are used to estimate capability.
Pp = (USL – LSL)/6S.total
Ppk = Ppk = Cp(1k)
You will find that the reference manual contained within Minitab references Cp and Cpk as potential metrics. This manual also indicates that Cp and Cpk are soley dependent on within group variations, whereas Pp and Ppk are overall metrics that rely on the total variations. Generally you will find that:
1. Cp will be greater than or equal to Cpk.
2. Pp will be greater than or equal to Ppk.
3. Pp will be greater than or equal to Cp.
Althrough I have restricted my frame of reference for this post, there are other metrics like Cpm that should be considered.
Jim Ace0August 14, 2007 at 12:32 am #159893
JENNIFERParticipant@JENNIFER Include @JENNIFER in your post and this person will
be notified via email.with no disprespect to the previous post, I find it troublesome that we insist on making what should be an easy concept difficult for the masses. Cpk is now used for shortterm, Ppk for long term studies. While both sides make compelling arguments, go with the modern definitions.
0August 14, 2007 at 2:40 am #159895
Greg KaoParticipant@GregKao Include @GregKao in your post and this person will
be notified via email.Jim Ace, you have great knowledge in process capability. your article let six sigma fan understand process capability more clearly. Meanwhile, I think you make a typing mistake.
In your article,
“Pp will be greater than or equal to Cp” should be modified to
“Cp will be greater than or equal to Pp”
0August 14, 2007 at 3:14 am #159898
Jim AceParticipant@JimAce Include @JimAce in your post and this person will
be notified via email.Greg Kao, you are totally correct! I made a mistake, and it makes me feel a little silly. I even read the post 3 times before hitting the Post Message button. I guess this just reinforces the idea that several heads are far smarter than one! Great catch and thank you very much. The wording should have been:
Cp will be greater than or equal to Cpk
Pp will be greater than or equal to Ppk
Cp will be greater than or equal to Pp
Again, I appreciate your attention to detail.
Jim Ace0August 14, 2007 at 6:15 pm #159924I would prefer that this be written as:
Cp will be always be greater than or equal to Cpk
Pp will be always be greater than or equal to Ppk
Cp will be usually be greater than Pp (There are cases where Pp > Cp.) Similarly, Cpk is usually greater than Ppk, but not always.0August 14, 2007 at 7:00 pm #159925“Cp will be always be greater than or equal to Cpk” This statement is not entirely true!!No offenses.It holds good only for bilateral tolerance and normal data.Either the tolerance is unilateral or your distribution is skewed, Cp value would be lesser then Cpk.With due respect,
Hardik0August 14, 2007 at 7:32 pm #159927Hardik,
No offense taken. Thank you for your comments.
My opinion is that when you have a unilateral tolerance, Cp is either missing, unnecessary to calculate, or equal to Cpk. I suggest this since the only difference between Cp and Cpk is process centering, when only one specification exist, your process is centered between the specs. Also since Cp = [(USL – LSL) / (6 * Sigma)] where Sigma is calculated is [Rbar (or MRbar)] /d2. So, if we agree on this formula, how should Cp be calculated when one parameter is missing?
Matt0August 14, 2007 at 7:50 pm #159928Matt,Thanks for pointing it out.You are correct. All SQC Manuals says the same.In fact the Cp value does not matter for Unilateral tolerance.However what i have seen from my experience is when you have unilateral tolerance or maximum material condition for some dimension, lot of CMM reports give Cp value.In all those cases Cp value is lesser then Cpk.
This is the way they calculate it.
Nominal=0
USL=0.75
hence it’s unilateral tolerance Take LSL=0
Moreover you can’t deny the fact that skewed distributions have Cp lesser than Cpk.Let me know till what extent my understanding is true!!Cheers,
Hardik0August 14, 2007 at 8:07 pm #159929Yikes!
“the way they calculate it. Nominal=0 USL=0.75 hence it’s unilateral tolerance Take LSL=0”
Wow. They are artificially creating a LSL when it should be missing.
For example, if they set the LSL=0 and the Xbar of the process is 0.375, no harm is done. But if the process Xbar is closer to 0 (better) the Cpk would get worse. In other words, as the process improves, the Cpk would get worse. This doesn’t make much sense to me. I would ask the mfg. of the CMM to not add an artificial tolerance, such as 0, when it should be left missing.
I guess the problem is that setting the LSL=0 is wrong because it would change the result for Cpk & Ppk when the process average gets more than half the distance between the USL and the LSL.
Just my 2¢0August 14, 2007 at 8:47 pm #159931Yeah Matt,I totally agree with you.I found mean deviation value on the same report I am referring to. Thats something new to me. Do you have any idea about it? Is there any relation between mean deviation and mean?Awaiting your valuable response.
0August 15, 2007 at 1:17 pm #159948No, I am not familiar with the term “mean deviation.” Perhaps others in this forum can help.
The following may be unrelated, but I will see if this might be what they are referring to.
I am familiar with “mean” and “standard deviation” and when you take (standard deviation / mean) this is often referred to as the coefficient of variation.
The coefficient of variation is a measure of how much variation exists in relation to the mean. Another way to describe it is a measure of the significance of the sigma in relation to the mean. The larger the coefficient of variation, the more significant the sigma, relative to the mean.0August 15, 2007 at 1:25 pm #159949
accringtonParticipant@accrington Include @accrington in your post and this person will
be notified via email.Type ‘mean deviation’ in the with the exact phrase box in Google’s advanced search. You’ll get about 397,000 entries.
Good Luck0August 15, 2007 at 1:43 pm #159950Matt,I googled it and found the following results.Thought of sharing with you.http://en.wikipedia.org/wiki/Absolute_deviationhttp://mathworld.wolfram.com/MeanDeviation.htmlMoreover on the topic of unilateral tolerance, I agreed with you as a statistician but when I think as an engineer , it sounds weired to me.Let’s say there is nominal dimension of 10mm with positive tolerance of 1 mm, 10+1.In this case, you can have your dimension falling anywhere between 10 to 11 but can’t have dimension lesser then 10.Hence 10 is your LSL, isn’t it ?
0August 15, 2007 at 1:44 pm #159951Thanks, Accrington for your advice!!
0August 15, 2007 at 2:24 pm #159954I agree that if the target is 10.0 and you are allowed + 1.0, then the USL=11.0 and the target = 10.0. If, in addition, anything less than 10.0 is unacceptable, then the LSL = 10.0. So the specifications would be LSL=10.0, Target=10.0, USL=11.0
0December 17, 2007 at 7:10 am #166187
Mahesh PatilParticipant@MaheshPatil Include @MaheshPatil in your post and this person will
be notified via email.You are making it opposite.
Cpk is Long Term while Ppk is Short term
0December 17, 2007 at 11:15 am #166195Correct
0December 17, 2007 at 2:30 pm #166202
TaylorParticipant@ChadVader Include @ChadVader in your post and this person will
be notified via email.Mahesh & Lolita
not only have you two answered a post that is 4 YEARS old you have answered it WRONG0December 17, 2007 at 4:29 pm #166211Are you sure?
Please take your time and check0December 17, 2007 at 8:04 pm #166226Lolita,/Marlon/Dog Sxxt/All Things…
The data indicates that you are wrong…0December 19, 2007 at 8:43 am #166324Is that all?
0December 19, 2007 at 1:27 pm #166329
Bucking broncoParticipant@Buckingbronco Include @Buckingbronco in your post and this person will
be notified via email.Lolita is none of those …
Lolita is either one of the girls with a wetT shirt, or one of the guys with the waterpistol.0May 6, 2008 at 11:49 am #171782
mohammed amraniParticipant@mohammedamrani Include @mohammedamrani in your post and this person will
be notified via email.pouvez vous m’expliquer qu’est ce que je dois faire pour étudier la capabilité d’une produit ou d’un process orsqu’il a une seul tolérance (mini)
0August 31, 2008 at 9:08 am #175331This is very confusing sir!
i am requesting all of you to please make one final dicision & then communicate.
Manish0August 31, 2008 at 11:43 am #175333These are 2 indices for estimating process capability from historical data. The tricky part is how to estimate the process standard deviation.
Cpk uses only the within subgroup method of estimating the standard deviation
Ppk uses within and between variation to estimate standard deviation. (Drive your process to a stable condition, and use this to show your process capability). Even a stable process will show movement of the mean, and why would you not want to encompass all the variation when you estimate standard deviation for process capability reporting?
I personally think the short term / long term mumbojumbo just confuses things. These indices are simply based on two methods for estimating standard deviation from historical data.
Conceptually the “within” subgroup method makes sense for control charts, which will make them sensitive enough to detect assignable causes. I am sure that Shewhart would be glad that I agree with him (ha ha).
0January 15, 2009 at 5:53 am #179758
anil yadavParticipant@anilyadav Include @anilyadav in your post and this person will
be notified via email.Any One Could Tell Me When I Should Use Cpk or Ppk
0January 15, 2009 at 11:50 am #179763Ppk is used to get a sense of long term…whatever that means…capability while Cpk looks at short term…whatever that means…variation. I would suggest using Cpk.
0November 11, 2009 at 5:48 am #186738I think you ment vice versa.
If you look a few rows before: “Process Performance is only used when process control cannot be evaluated. An example of this is for a short preproduction run. “
And also this is how it results when an analysis is performed in QS STAT.0November 11, 2009 at 7:55 am #186739Pp and Ppk are called performance indices The performance is measured on quick test means if a batch is ready parts are produced and now if you take a random sample and assess T/6s or USLxbar/ 3s , XbarLSL/3s it is Pp and Ppk. No assessment of trends, run, outlier necessary( Stability not checked) No prediction of process behavior possible hence to minimize risk and have safety margin Value is min 1.67. The distribution need not be normal . Std deviation ia assessed of individual sample s= (Sum(Xbarx)/(n1))^1/2
Cp Cpk are capability indices based on long term only assessed for stable corrected process ( the sampling technique is random systematic sampling after monitoring and correcting the process by elimination of special causes) The capability is calculated for stable process in which the variation exist only for common cause and having a natural behavior the distribution is close to normal.hence std deviation is evaluated on based on sample distribution).For predicting process behavior this index is used.
0November 11, 2009 at 9:24 am #186741Wrong
0November 11, 2009 at 9:44 am #186742Wow this is still going?
Are you all aware that way back, the consultants at Ford started using Cpk as long term and Ppk as short term.
The rest of the world used Cpk as short term and Ppk as long term?
I am too young to be able to comment on who was right and who was wrong. I only know that before my time there was a split in how the nomenclature was used. Hence all this horrible confusion.
The calculation for Cpk in general uses a pooled standard deviation of the subgroups. This in effect gives you a standard deviation as if all subgroups had the same average eg: as if they where in control (short explanation).
The Ppk in general uses the standard deviation calculated from all the data. This standard deviation being affected by shifts in the process and special causes.
IF the process is stable then Cpk and Ppk will appear similar.
Now as a flippant aside, Cpk and Ppk are one sided measures, they only care about the closest specification limit and not where the other limit is found.
Zbench is a much better number to use instead becuase it takes into account the position of both specification limits.
Besides you should be looking at the control charts, and the capability histograms before any decision is being made, that way you have a visual check of the numbers reported and you can make decisions on what the process stability is like, and where the process is.0November 11, 2009 at 4:56 pm #186748Zbench is nonsense
0November 13, 2009 at 1:41 am #186788
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.Cpk can understimate the total variation if the between subgroup variation is high. Therefore, I always use Ppk. Always ensure your process is incontrol and the data are normally distributed before reporting the index though.
0November 27, 2009 at 4:14 am #187104
Rishi HansMember@RishiHans Include @RishiHans in your post and this person will
be notified via email.Cpk stands for Continuous Process Kurtosis
Rishi Hans0November 27, 2009 at 4:16 am #187105
Rishi HansMember@RishiHans Include @RishiHans in your post and this person will
be notified via email.Cpk stands for CONTINUOUS PROCESS KURTOSIS
0November 27, 2009 at 6:43 am #187107I don’t have time to read through all the replies, but the k stands for katayori which I believe means deviation in Japanese. I think the word kurtosis is Greek and means bulge (shape.)
Do you know what the word fiat means? :)0November 27, 2009 at 1:27 pm #187115CPK California Pizza Kitchen
CPK Creatine Phosphokinase (muscle enzyme)
Cpk Process Capability (statistical process control measurement of process capability)
Cpk Process Capability (statistical measurement of how capable a process)
Cpk Capability Index
CPK Center for Personkommunikation
CPK Cabbage Patch Kid
CPK Corey, Pauling, Koltun (molecular modeling coloring scheme)
CPK Chesapeake Regional Airport (airport code, VA)
CPK Crystal Park (Crystal City, Arlington, VA)
CPK Circuit Pack (Nortel)
CPK Common Platform Kit (Pirelli)
CPK Coefficient of Process Capability0November 27, 2009 at 1:45 pm #187116Butler and others dealt with this back in 2007 from an original post in 2005. I think the latest post is a plant from Webby to get us all agitated again.https://www.isixsigma.com/forum/showmessage.asp?messageID=123559
0November 27, 2009 at 8:17 pm #187119I once worked on a project that had a huge CPK (Cabbage Patch Kid) problem as well as employee issues. Luckily it was in a 3rd world country, so we got creative and now I’m responsible for many of the shoes you wear.
Six sigma at it’s best.
Stevo0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.