20030915, 17:12  #1 
Sep 2003
5×11×47 Posts 
Error rate for LL tests
We estimate error rate as follows:
 Every single line in BAD is a separate verifiedbad result  Every single line in LUCAS_V.TXT is a separate verifiedgood result.  Lines in HRF3.TXT handled as described below: The file HRF3.TXT contains unverified results (only one LL test, or more than one but with nonmatching doublechecks). How do we estimate the error rate for these results? Any exponent that occurs only once must be ignored: we have no idea whether it is a good or a bad result. However, when an exponent occurs N times (in N separate lines of HRF3.TXT), we know for sure that there are N distinct nonmatching residues returned (otherwise there would have been a match and the results would have been removed from HRF3.TXT and moved to the files BAD and LUCAS_V.TXT), and therefore at least N1 of them must be bad, and the remaining one could be good or bad. The odds are, that remaining one result is good (only we don't yet know which of the N it is). After all, the error rate is relatively low, so the odds of N1 bad + 1 good are much larger than the odds of all N bad. In the most common case of 2 separate lines in HRF3.TXT for the same exponent, in most cases one will be good and one will be bad and a triplecheck will sort out which is which. So, to summarize:  If an exponent occurs in only one line in HRF3.TXT, ignore it.  If an exponent occurs in N separate lines in HRF3.TXT, assume one good result and N1 bad results. Error rates for the various exponent ranges are: [Sept 9 2003 data] Code:
0  999,999 (163+00)/(163+0+70581) = .002 1,000,000  1,999,999 (718+00)/(718+0+58971) = .012 2,000,000  2,999,999 (1203+00)/(1203+0+54591) = .021 3,000,000  3,999,999 (1465+00)/(1465+0+52939) = .026 4,000,000  4,999,999 (1837+00)/(1837+0+51026) = .034 5,000,000  5,999,999 (1905+00)/(1905+0+49346) = .037 6,000,000  6,999,999 (1804+00)/(1804+0+49253) = .035 7,000,000  7,999,999 (1956+2712)/(1956+27+47579) = .039 8,000,000  8,999,999 (1612+500235)/(1612+500+45865) = .039 9,000,000  9,999,999 (625+1312639)/(625+1312+33724) = .036 10,000,000  10,999,999 (53+1369672)/(53+1369+2978) = .170 11,000,000  11,999,999 (50+1384679)/(50+1384+1993) = .220 12,000,000  12,999,999 (31+1819895)/(31+1819+1415) = .292 13,000,000  13,999,999 (33+1611798)/(33+1611+1392) = .278 14,000,000  14,999,999 (4+1541764)/(4+1541+1172) = .287 15,000,000  15,999,999 (2+1091541)/(2+1091+796) = .292 16,000,000  16,999,999 (0+757375)/(0+757+598) = .281 17,000,000  17,999,999 (0+13467)/(0+134+233) = .182 18,000,000  18,999,999 (0+8643)/(0+86+174) = .165 19,000,000  19,999,999 (2+3216)/(2+32+40) = .243 20,000,000  20,999,999 (1+21)/(1+2+14) = .117  The results for the low exponents have very low error rates. Maybe this is because the run time is very short, or maybe for such old results the bad results were purged or not recorded.  The results for the higher exponents are artificially high. This is because when the server gets a result returned with a nonzero error code, it automatically reassigns that exponent for another firsttime LL test without waiting a couple of years for regular doublechecking to catch up to the current firsttime range. Thus, a significant fraction of bad results are caught much sooner, but good results are not verified until perhaps years later. Note: it is possible for a nonzero error code to still yield a good result and it is possible for a zero error code to yield a bad result. See the Most popular error codes thread. Since the current leading edge of double checking is around 10.1M, all error rates above this are artificially high for the time being. We also note: So far, there is no evidence that error rates are increasing for larger exponents. The error rate remains steady around 3.5%  4.0% over a broad range of exponents. Larger exponents have longer run times and thus we might expect more errors, but on the other hand newer machines run Windows XP and other modern operating systems with much better memory protection. So perhaps these effects cancel each other. Note that this error rate of 3.5%  4.0% is an average over all users and computers. Some computers have a 0% error rate, others have a high doubledigit error rate. This depends on hardware issues, memory quality, CPU temperature, etc. Finally, we might ask, what do we get if we only consider results returned by programs Wxx (George Woltman's Prime95/mprime) and ignore results returned by other programs? The answer is: almost exactly the same. Error rates for the various exponent ranges, taking into account only results returned by programs Wxx (George Woltman), are: [Sept 9 2003 data] Code:
0  999,999 (85+00)/(85+0+30303) = .002 1,000,000  1,999,999 (552+00)/(552+0+45733) = .011 2,000,000  2,999,999 (1171+00)/(1171+0+52421) = .021 3,000,000  3,999,999 (1445+00)/(1445+0+52267) = .026 4,000,000  4,999,999 (1779+00)/(1779+0+49024) = .035 5,000,000  5,999,999 (1891+00)/(1891+0+48521) = .037 6,000,000  6,999,999 (1793+00)/(1793+0+47982) = .036 7,000,000  7,999,999 (1945+2712)/(1945+27+46770) = .040 8,000,000  8,999,999 (1602+500235)/(1602+500+45616) = .039 9,000,000  9,999,999 (622+1312639)/(622+1312+33397) = .036 10,000,000  10,999,999 (53+1369672)/(53+1369+2964) = .170 11,000,000  11,999,999 (49+1384679)/(49+1384+1984) = .220 12,000,000  12,999,999 (30+1819895)/(30+1819+1405) = .293 13,000,000  13,999,999 (33+1611798)/(33+1611+1330) = .284 14,000,000  14,999,999 (4+1541764)/(4+1541+1147) = .290 15,000,000  15,999,999 (1+1091541)/(1+1091+781) = .294 16,000,000  16,999,999 (0+757375)/(0+757+581) = .285 17,000,000  17,999,999 (0+13467)/(0+134+225) = .186 18,000,000  18,999,999 (0+8643)/(0+86+168) = .169 19,000,000  19,999,999 (2+3216)/(2+32+40) = .243 20,000,000  20,999,999 (1+21)/(1+2+12) = .133 
20030915, 18:17  #2  
Aug 2002
Richland, WA
10000100_{2} Posts 
Re: Error rate for LL tests
Quote:


20030915, 22:14  #3  
Sep 2003
5×11×47 Posts 
Re: Re: Error rate for LL tests
Quote:
In the range 7M8M there are only 60 exponents that have never had at least two LL tests done. In the range 8M9M, there are only 525 such exponents, and in the range 9M10M, there are 5791 such exponents. So arguably, only the 9M10M error rate could be expected to change much over time. For higher exponents, the rates are artificially high because results returned with a nonzero error code get doublechecked several years sooner than results returned with a zero error code. That is because the server immediately reassigns such nonzeroerrorcode results for another "firsttime" LL test. However, as soon as the leading edge of doublechecking (currently around 10.1M) arrives, all those lagging doublechecks of zeroerrorcode results finally end up getting done and the ratio gets back into proper balance. For this reason, I'd argue that for anything below about 0.5M less than the leading edge of doublechecking, we already have a fairly accurate estimate of error rate. Quote:
It's unfortunate that the server behavior which is optimized for detecting bad results as quickly as possible also makes it very difficult to estimate error rates for the leading edge of firsttime LL tests. 

20030915, 22:18  #4 
Sep 2003
5·11·47 Posts 
To summarize, the algorithm I use is:
If an exponent has had 1 LL test done:  We can't draw any conclusions. If an exponent has had (N > 1) LL tests done, with a match:  We know exactly how many of the N tests are good and how many are bad If an exponent has had (N > 1) LL tests done, with no match:  We know that at least N1 of the tests are bad.  Assume N1 bad and 1 good, because that's much more likely than all N bad. 
20030915, 23:26  #5 
P90 years forever!
Aug 2002
Yeehaw, FL
17012_{8} Posts 
The error checking has not improved much since way back. There have been some changes around the edges: more conservative FFT lengths, roundoff checking every iteration if near an FFT limit, tolerating roundoff errors up to 0.6, etc.
Also, I think the error rate is likely to remain fairly constant because computers are getting faster at roughly the same rate as the difficulty of running an LL test. That is, a 10 million first time test 3 years ago probably took as much elapsed time as a 20 million first time test today. 
20030915, 23:34  #6 
Aug 2002
Richland, WA
2^{2}×3×11 Posts 
I understand the algorithm you are using and I agree that is fairly accurate, but I'm not convinced it is accurate enough to say that the error rate is not still increasing with exponent size. I'd be willing to concede that the error rates for the 7M and 8M ranges are probably not going to change very much, but I don't see how we can conclude that the error rate for the 9M range is definitely not going to end up greater than 4%.
I just don't trust this type of prediction when there may be a bias one way or the other with the exponents that have had enough tests run on them to be used in your data. I do think it is likely that the error rates will likely level off/drop over time simply because: (1) George has improved Prime95/mprime error checking over time. (2) I think error rates are mostly a function of runtime for an exponent, and I think average runtimes are levelling off if not dropping over time (which was not the case early in the project's history). However, it is possible that these factors may be countered (at least in the 7M to 20M ranges) by the fact that processors in the last few years have been running hotter than they did in the past due to greater competition among the CPU makers. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Error rate plot  patrik  Data  111  20201226 17:13 
error rate and mitigation  ixfd64  Hardware  4  20110412 02:14 
Faster LL tests, less error checking?  Prime95  Software  68  20101231 00:06 
EFF prize and error rate  S485122  PrimeNet  15  20090116 11:27 
What ( if tracked ) is the error rate for Trial Factoring  dsouza123  Data  6  20031023 22:26 