You could say that my journey started when I wanted to know if forecast predictions could be made on "random" numbers. This quest will probably carry with me throughout my life. It is currently being applied to the lottery though, despite all those who have told me there's absolutely nothing to analyze (when the programmed word "random" is applied).
Seven years ago, I pulled up the PowerBall® and the Fantasy 5 (now called The Pick) number histories. They were both similar in the idea that random numbers were drawn from a pool of numbers and people placed bets.
What I noticed almost immediately is that the Fantasy 5 was very evenly distributed when summing counts on individual ball numbers (over a range) versus the mountain peaks and ranges of the PB. Of other analysis I've conducted since then, the PB is still the only one that seems to have some things that are not easily explainable (and is the only one that I know of or have looked at that is still real balls versus computer simulated).
I've tried counts on ranges with focus on one too many levels and have decided to call it quits several times, until a little crack of light dawns and spreads new energy. One big example is when I found on PB's site the hidden part that not only shows which draw came from which ball set, but also indicated that 6 draws occur per draw. I spent countless hours trying to separate the ball sets for examination to not much avail.
Then I found what appeared to be some typos or discrepancies and contacted musl. To my surprise, I had indeed found typos and in return asked if there was any way to get data prior 2005. I now have data going back 18 years and I still found more typos (about 55 data points out of 11,000) to which half they told me are too old to get definitive answers on (which alter some of my results, but only by a hair of a margin).
The most recent findings that baffled me (besides the fact that since 1992 they've had 4 ball sets and 6 draws per draw) are: quad/fifth duplication and chi testing results. I'll try to concisely simplify my findings on both right here.
Quad/duplication (this is when I only had 4,000 rows of data versus the now 11,000):
First, I wanted to know how often 4 of the 5 drawn white ball #'s matched elsewhere in any other draw (as 4 out of even 55 equals 340,000 combinations, which implies that it shouldn't happen very often). I created a macro in Excel to point out every occurrence and found that it happened far more than anticipated. 103% to be exact, or rather, out of 4,000 rows of draw data, 4 out of 5 numbers on any one row match another row (in total) 4,120 times because some draws/rows can have a multiple quad-duplication elsewhere.
Second, I wanted to know how often 5 of the 5 matched. So far (when I was only examining 4,000), only 2 appear to be public and interestingly enough, numbers generated from atmospheric noise from a random # producing site yielded similar results, meaning that in a span of only 1,500 consecutive draws, all 5 can match identically. But back to the 5 of 5 (even though there's 5 million combinations) appearing in only 2 of the "public" winning drawn numbers when 31 other 5 identical matches just happened to fall on the pre and post tests - instead of "public" draws - (or possibly, they didn’t want the public to see that that kind of duplication does happen - system of the design anyone)?
Chi:
After probing further with musl on the typos they couldn’t pull up data for, they also shared that they break down the ball sets separately and run chi tests. I immediately hit the internet for what chi was and how I could use it. Nothing really helped except my subconscious soaking on it for a couple of hours when I realized that each ball set is one row of counts (moving history) by columns of ball # and that expected values can be calculated, all in excel. Manually, you can get it to give you a chi probability test percentage based on one setting of how deep to look. For me, I went from start to finish on a particular ball set and could get the entire last chi result of that whole set.
Then I realized, I could create a macro that would step interval 6 draws at a time by the ball set that it occurred at (grouping one actual draw together versus separating them one line at a time). What I mean by this is taking the very first ball set (of 6 actual draws, not just one), calculating the chi, and recording at its interval level/depth (in this instance, #1). Second, take the macro and step to the next ball set chosen, and re-calculate adding those 6 new rows of 5 balls drawn data. Because you've got the separation of ball sets AND 6 draws per draw, it doesn’t take more than 5 weeks to discard the first 50 results (instead of one full year), like most of the chi tutorials indicate is a needed examination step. What you get is the charts I had already posted to this thread (and the comments I included explaining how odd they behave). When compared to #'s generated from that random sample source mentioned earlier, PB seems controlled.
So now that I've got 18 years of data, down to the ball sets #'s and all 6 draws per every draw since then (despite having 4 draws that have errors in them, 1997, that effect 45 other draws later), my plans are to recalculate quad/fifth duplication and chi results because before, I was only doing it on 36% of the history.
Of course, if anyone cares enough to want to help dissect in any aspect... I'm willing to share the data I have (but most don't care about "every little detail", where as I have to know why they've done what they've done in order to attempt to come up with quality bets).