Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics,[1][2][3] and is particularly problematic when frequency data are unduly given causal interpretations.[4] The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling[4][5] (e.g., through cluster analysis[6]).
Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.[7][8]
Edward H. Simpson first described this phenomenon in a technical paper in 1951,[9] but the statisticians Karl Pearson (in 1899[10]) and Udny Yule (in 1903[11]) had mentioned similar effects earlier. The name Simpson's paradox was introduced by Colin R. Blyth in 1972.[12] It is also referred to as Simpson's reversal, the Yule–Simpson effect, the amalgamation paradox, or the reversal paradox.[13]
Mathematician Jordan Ellenberg argues that Simpson's paradox is misnamed as "there's no contradiction involved, just two different ways to think about the same data" and suggests that its lesson "isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."[14]
Examples
UC Berkeley gender bias
One of the best-known examples of Simpson's paradox comes from a study of gender bias among graduate school admissions to University of California, Berkeley. The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance.[15][16]
All
Men
Women
Applicants
Admitted
Applicants
Admitted
Applicants
Admitted
Total
12,763
41%
8,442
44%
4,321
35%
However, when taking into account the information about departments being applied to, the different rejection percentages reveal the different difficulty of getting into the department, and at the same time it showed that women tended to apply to more competitive departments with lower rates of admission, even among qualified applicants (such as in the English department), whereas men tended to apply to less competitive departments with higher rates of admission (such as in the engineering department). The pooled and corrected data showed a "small but statistically significant bias in favor of women".[16]
The data from the six largest departments are listed below:
Department
All
Men
Women
Applicants
Admitted
Applicants
Admitted
Applicants
Admitted
A
933
64%
825
62%
108
82%
B
585
63%
560
63%
25
68%
C
918
35%
325
37%
593
34%
D
792
34%
417
33%
375
35%
E
584
25%
191
28%
393
24%
F
714
6%
373
6%
341
7%
Total
4526
39%
2691
45%
1835
30%
Legend:
greater percentage of successful applicants than the other gender
greater number of applicants than the other gender
bold - the two 'most applied for' departments for each gender
The entire data showed total of 4 out of 85 departments to be significantly biased against women, while 6 to be significantly biased against men (not all present in the 'six largest departments' table above). Notably, the numbers of biased departments were not the basis for the conclusion, but rather it was the gender admissions pooled across all departments, while weighing by each department's rejection rate across all of its applicants.[16]
Kidney stone treatment
Another example comes from a real-life medical study[17] comparing the success rates of two treatments for kidney stones.[18] The table below shows the success rates (the term success rate here actually means the success proportion) and numbers of treatments for treatments involving both small and large kidney stones, where Treatment A includes open surgical procedures and Treatment B includes closed surgical procedures. The numbers in parentheses indicate the number of success cases over the total size of the group.
Treatment
Stone size
Treatment A
Treatment B
Small stones
Group 1 93% (81/87)
Group 2 87% (234/270)
Large stones
Group 3 73% (192/263)
Group 4 69% (55/80)
Both
78% (273/350)
83% (289/350)
The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B appears to be more effective when considering both sizes at the same time. In this example, the "lurking" variable (or confounding variable) causing the paradox is the size of the stones, which was not previously known to researchers to be important until its effects were included.[citation needed]
Which treatment is considered better is determined by which success ratio (successes/total) is larger. The reversal of the inequality between the two ratios when considering the combined data, which creates Simpson's paradox, happens because two effects occur together:[citation needed]
The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give cases with large stones the better treatment A, and the cases with small stones the inferior treatment B. Therefore, the totals are dominated by groups 3 and 2, and not by the two much smaller groups 1 and 4.
The lurking variable, stone size, has a large effect on the ratios; i.e., the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group 3) does worse than the group with small stones, even if the latter used the inferior treatment B (group 2).
Based on these effects, the paradoxical result is seen to arise because the effect of the size of the stones overwhelms the benefits of the better treatment (A). In short, the less effective treatment B appeared to be more effective because it was applied more frequently to the small stones cases, which were easier to treat.[18]
Jaynes argues that the correct conclusion is that though treatment A remains noticeably better than treatment B, the kidney stone size is more important.[19]
Batting averages
A common example of Simpson's paradox involves the batting averages of players in professional baseball. It is possible for one player to have a higher batting average than another player each year for a number of years, but to have a lower batting average across all of those years. This phenomenon can occur when there are large differences in the number of at bats between the years. Mathematician Ken Ross demonstrated this using the batting average of two baseball players, Derek Jeter and David Justice, during the years 1995 and 1996:[20][21]
Year
Batter
1995
1996
Combined
Derek Jeter
12/48
.250
183/582
.314
195/630
.310
David Justice
104/411
.253
45/140
.321
149/551
.270
In both 1995 and 1996, Justice had a higher batting average (in bold type) than Jeter did. However, when the two baseball seasons are combined, Jeter shows a higher batting average than Justice. According to Ross, this phenomenon would be observed about once per year among the possible pairs of players.[20]
Vector interpretation
Simpson's paradox can also be illustrated using a 2-dimensional vector space.[22] A success rate of (i.e., successes/attempts) can be represented by a vector, with a slope of . A steeper vector then represents a greater success rate. If two rates and are combined, as in the examples given above, the result can be represented by the sum of the vectors and , which according to the parallelogram rule is the vector , with slope .
Simpson's paradox says that even if a vector (in orange in figure) has a smaller slope than another vector (in blue), and has a smaller slope than , the sum of the two vectors can potentially still have a larger slope than the sum of the two vectors , as shown in the example. For this to occur one of the orange vectors must have a greater slope than one of the blue vectors (here and ), and these will generally be longer than the alternatively subscripted vectors – thereby dominating the overall comparison.
Correlation between variables
Simpson's reversal can also arise in correlations, in which two variables appear to have (say) a positive correlation towards one another, when in fact they have a negative correlation, the reversal having been brought about by a "lurking" confounder. Berman et al.[23] give an example from economics, where a dataset suggests overall demand is positively correlated with price (that is, higher prices lead to more demand), in contradiction of expectation. Analysis reveals time to be the confounding variable: plotting both price and demand against time reveals the expected negative correlation over various periods, which then reverses to become positive if the influence of time is ignored by simply plotting demand against price.
Psychology
Psychological interest in Simpson's paradox seeks to explain why people[who?] deem sign reversal to be impossible at first.[clarification needed] The question is where people get this strong intuition from, and how it is encoded in the mind.
Simpson's paradox demonstrates that this intuition cannot be derived from either classical logic or probability calculus alone, and thus led philosophers to speculate that it is supported by an innate causal logic that guides people in reasoning about actions and their consequences.[4] Savage's sure-thing principle[12] is an example of what such logic may entail. A qualified version of Savage's sure thing principle can indeed be derived from Pearl's do-calculus[4] and reads: "An action A that increases the probability of an event B in each subpopulation Ci of C must also increase the probability of B in the population as a whole, provided that the action does not change the distribution of the subpopulations." This suggests that knowledge about actions and consequences is stored in a form resembling Causal Bayesian Networks.
Probability
A paper by Pavlides and Perlman presents a proof, due to Hadjicostas, that in a random 2 × 2 × 2 table with uniform distribution, Simpson's paradox will occur with a probability of exactly 1⁄60.[24] A study by Kock suggests that the probability that Simpson's paradox would occur at random in path models (i.e., models generated by path analysis) with two predictors and one criterion variable is approximately 12.8 percent; slightly higher than 1 occurrence per 8 path models.[25]
Simpson's second paradox
A second, less well-known paradox was also discussed in Simpson's 1951 paper. It can occur when the "sensible interpretation" is not necessarily found in the separated data, like in the Kidney Stone example, but can instead reside in the combined data. Whether the partitioned or combined form of the data should be used hinges on the process giving rise to the data, meaning the correct interpretation of the data cannot always be determined by simply observing the tables.[26]
Judea Pearl has shown that, in order for the partitioned data to represent the correct causal relationships between any two variables, and , the partitioning variables must satisfy a graphical condition called "back-door criterion":[27][28]
They must block all spurious paths between and
No variable can be affected by
This criterion provides an algorithmic solution to Simpson's second paradox, and explains why the correct interpretation cannot be determined by data alone; two different graphs, both compatible with the data, may dictate two different back-door criteria.
When the back-door criterion is satisfied by a set Z of covariates, the adjustment formula (see Confounding) gives the correct causal effect of X on Y. If no such set exists, Pearl's do-calculus can be invoked to discover other ways of estimating the causal effect.[4][29] The completeness of do-calculus [30][29] can be viewed as offering a complete resolution of the Simpson's paradox.
Criticism
One criticism is that the paradox is not really a paradox at all, but rather a failure to properly account for confounding variables or to consider causal relationships between variables.[31]
Another criticism of the apparent Simpson's paradox is that it may be a result of the specific way that data is stratified or grouped. The phenomenon may disappear or even reverse if the data is stratified differently or if different confounding variables are considered. Simpson's example actually highlighted a phenomenon called noncollapsibility,[32] which occurs when subgroups with high proportions do not make simple averages when combined. This suggests that the paradox may not be a universal phenomenon, but rather a specific instance of a more general statistical issue.
Critics of the apparent Simpson's paradox also argue that the focus on the paradox may distract from more important statistical issues, such as the need for careful consideration of confounding variables and causal relationships when interpreting data.[33]
Despite these criticisms, the apparent Simpson's paradox remains a popular and intriguing topic in statistics and data analysis. It continues to be studied and debated by researchers and practitioners in a wide range of fields, and it serves as a valuable reminder of the importance of careful statistical analysis and the potential pitfalls of simplistic interpretations of data.
^Rogier A. Kievit, Willem E. Frankenhuis, Lourens J. Waldorp and Denny Borsboom, Simpson's paradox in psychological science: a practical guide https://doi.org/10.3389/fpsyg.2013.00513
^Robert L. Wardrop (February 1995). "Simpson's Paradox and the Hot Hand in Basketball". The American Statistician, 49 (1): pp. 24–28.
^
Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Series B. 13 (2): 238–241. doi:10.1111/j.2517-6161.1951.tb00088.x.
^ ab
Colin R. Blyth (June 1972). "On Simpson's Paradox and the Sure-Thing Principle". Journal of the American Statistical Association. 67 (338): 364–366. doi:10.2307/2284382. JSTOR2284382.
^Jaynes, E. T.; Bretthorst, G. Larry (2003). "8.10 Pooling the data". Probability theory: the logic of science. Cambridge, UK ; New York, NY: Cambridge University Press. ISBN978-0-521-59271-0.
^ abKen Ross. "A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans (Paperback)" Pi Press, 2004. ISBN0-13-147990-3. 12–13
^ abPearl, J.; Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. New York, NY: Basic Books.
^Shpitser, I.; Pearl, J. (2006). Dechter, R.; Richardson, T.S. (eds.). "Identification of Conditional Interventional Distributions". Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. Corvallis, OR: AUAI Press: 437–444.
The Wall Street Journal column "The Numbers Guy" for December 2, 2009 dealt with recent instances of Simpson's paradox in the news. Notably a Simpson's paradox in the comparison of unemployment rates of the 2009 recession with the 1983 recession.
Angle formed by a point on a circle and the 2 ends of a diameter is a right angle For the theorem sometimes called Thales' theorem and pertaining to similar triangles, see intercept theorem. Thales’ theorem: if AC is a diameter and B is a point on the diameter's circle, the angle ∠ ABC is a right angle. In geometry, Thales's theorem states that if A, B, and C are distinct points on a circle where the line AC is a diameter, the angle ∠ ABC is a right angle. Thales's theorem is a special cas…
قصر لينة التاريخيمعلومات عامةنوع المبنى قصرالمكان قرية لينة-رفحاءالبلد السعوديةتعديل - تعديل مصدري - تعديل ويكي بيانات قصر لينة التاريخي هو قصر سعودي قديم بُني بأمر من الملك المؤسس عبد العزيز بن عبد الرحمن آل سعود في بداية سنوات توحيده لبلاده عام 1354ــ 1355هـ ؛ ليكون مقرا ل…
خريطة المناطق العسكرية في الإمبراطورية الروسية عام 1913. المنطقة العسكرية في العسكرية السوفيتية (بالروسية: вое́нный о́круг) هي كل إقليم أو منطقة جغرافية تتواجد بها تنظيمات وأكاديميات عسكرية علاوة على المنشئات العسكرية الإدارية الأخرى، وقد استخدمت هذه التقسميات في الاتحاد ا…
Pembakar Meker-FisherSebuah pembakah Meker-FisherKegunaanPemanasanSterilisasiPembakaranPenemuKimiawan Prancis, Georges Mékerlbs Pembakar Meker-Fisher, atau pembakar Meker adalah sebuah alat yang menghasilkan gas api terbuka berganda, digunakan untuk pemanasan, sterilisasi, dan pembakaran. Alat ini digunakan ketika pekerjaan laboratorium membutuhkan nyala api yang lebih panas daripada yang bisa dicapai menggunakan pembakar Bunsen, atau digunakan ketika nyala api berdiameter lebih besar diinginka…
Type of congenital heart defect Medical conditionPulmonary atresia with ventricular septal defectOther namesPA-VSDS (abbr.)[1]A ventricular septal defect, one of the symptoms of this condition, under an ultrasound.SpecialtyMedical geneticsRisk factorsGenetic and environmental factors usually come into placeDiagnostic methodRadiological studies such as chest CT scans.Differential diagnosisPulmonary atresiaPrognosispoor without treatmentFrequencyrareDeathsuntreated PAVSD patients more like…
American cartoonist For the American singer, see Russell Patterson (singer). Russell PattersonRussell Patterson arrives in Hollywood in 1937 for the filming of Artists and Models.BornDecember 26, 1893Omaha, NebraskaDiedMarch 17, 1977(1977-03-17) (aged 83)Atlantic City, New JerseyNationalityAmericanArea(s)Illustrator, costumer and scenic designer, cartoonistNotable worksLife covers, scenic designs for Paramount's film Give Me a Sailor (1938)Mamie comic strip.AwardsNational Cartoonists Societ…
American pornographic actress Samantha FoxFox in 1980's Dracula ExoticaBornStasia Micula(1950-12-03)December 3, 1950[1][2]New York City, U.S.DiedApril 22, 2020(2020-04-22) (aged 69)New York City, U.S.EducationSarah Lawrence College[3]Hunter College[3]OccupationPornographic actressYears active1975–1984[4]EraGolden Age of PornNotable workJack 'n Jill 1980Roommates 1982A Night to Dismember 1983PartnerBobby Astyr (1978–2002; his death)[4]…
Prefektur Karafuto樺太Wilayah eksternal (1905–1907) Prefektur (1907–1949) di Kekaisaran Jepang dan Pendudukan Sekutu atas Jepang1905–1949 Flag Coat of arms Hijau: Prefektur Karafuto pada tahun 1942 Hijau terang: Wilayah koloni lain dari Kekaisaran JepangIbu kotaŌtomari (1907–1908)Toyohara (1908–1945)Populasi • Desember 1941 406,557 SejarahSejarah • Perjanjian Portsmouth 5 September 1905• Menjadi Prefektur 1907• Status wilayah ditingkatkan menjad…
Greenville Rancheria of Maidu IndiansTotal population144 enrolled members,22 rancheria population (2011)[1]Regions with significant populations United States ( California)LanguagesEnglish, MaiduRelated ethnic groupsother Maidu people The Greenville Rancheria of Maidu Indians of California is a federally recognized tribe of Maidu people in Plumas and Tehama Counties, California.[1][2] Reservation Location of the Greenville Rancheria The Greenville Rancheria is a …
Cet appareil ne doit pas être confondu avec le McDonnell F3H Demon, un avion à réaction américain des années 1950-60 Pour les articles homonymes, voir Démon. Cet article est une ébauche concernant un aéronef. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Hawker Demon Vue de l'avion. Constructeur Hawker Aircraft Ltd. Rôle Avion de chasse Premier vol 1933 Équipage 2 personnes Motorisation Moteur Rolls-Royc…
Pandemi koronavirus di QuebecActive cases per 100,000 people (as of 22 April) 0 1–49 50–99 100-199 200-299 ≥300PenyakitPandemi koronavirus 19-20Galur virusSARS-CoV-2LokasiQuebec, KanadaKasus pertamaMontrealTanggal kemunculan27 Februari 2020 (4 tahun, 1 bulan dan 5 hari)Tanggal22 April 2021Kasus terkonfirmasi341,645Kasus sembuh318,676Kematian10,845Situs web resmiQuebec Government COVID-19 kasus di Quebec,…
Programmable calculator produced by Texas Instruments TI-55a TI-55TypeProgrammableManufacturerTexas InstrumentsIntroduced1977Discontinued1979CalculatorPrecision11 Floating point (within display)Display typeLEDDisplay size8+2CPUProcessorTI TMC1503NLOtherPower supplybattery / cellsPower consumption2 LR44'sWeight84 grams (3.0 oz)Dimensions147 mm × 71 mm × 22 mm (5.8 in × 2.8 in × 0.85 in) TI-55 II Calculator, the second variant, with an LCD…
Нанкинский договор Подписание Нанкинского договора на борту корабля «Корнуэллс» Тип договора мирный договор Дата подписания 29 августа 1842 года Подписали империя Цин и Великобритания Стороны империя Цин Великобритания Языки английский и китайский (вэньянь) М…
Peta infrastruktur dan tata guna lahan di Komune Montigny-le-Guesdier. = Kawasan perkotaan = Lahan subur = Padang rumput = Lahan pertanaman campuran = Hutan = Vegetasi perdu = Lahan basah = Anak sungaiMontigny-le-GuesdierNegaraPrancisArondisemenProvinsKantonBray-sur-SeineAntarkomuneCommunauté de communes du Canton de Bray-sur-SeinePemerintahan • Wali kota (2008-2014) Evelyne Sivanne • Populasi1285Kode INSEE/pos77310 / …
Charity Shield FA 1982TurnamenCharity Shield FA Liverpool Tottenham Hotspur 1 0 Tanggal21 Agustus 1982StadionStadion Wembley, London← 1981 1983 → Charity Shield FA 1982 adalah pertandingan sepak bola antara Liverpool dan Tottenham Hotspur yang diselenggarakan pada 21 Agustus 1982 di Stadion Wembley, London. Pertandingan ini merupakan pertandingan ke-60 dari penyelenggaraan Charity Shield FA. Pertandingan ini dimenangkan oleh Liverpool dengan skor 1–0.[1] Pertandingan Liverp…
Calanoida Diaptomus Klasifikasi ilmiah Kerajaan: Animalia Filum: Arthropoda Subfilum: Crustacea Kelas: Maxillopoda Subkelas: Copepoda Superordo: GymnopleaGiesbrecht, 1882 [1] Ordo: CalanoidaSars, 1903 Famili Lihat teks Wikispecies mempunyai informasi mengenai Calanoida. Calanoida adalah ordo copepoda, sejenis zooplankton. Ordo ini termasuk 43 famili dengan sekitar 2000 spesies baik copepoda laut maupun air tawar.[2] Copepoda Calanoid penting dalam rantai makanan. Klasifikasi…
Turkish TV series on Netflix The ClubTurkishKulüp GenrePeriod dramaWritten byNecati Şahin Rana DenizerDirected bySeren YüceZeynep Günay TanStarringGökçe BahadırBarış ArduçSalih BademciFırat TanışMetin AkdülgerAsude KalebekTheme music composerEnder AkayComposersEnder AkayCem ErgunoğluGökhan Mert KoralCountry of originTurkeyOriginal languagesTurkishLadinoGreekNo. of seasons2No. of episodes20ProductionProducerSaner AyarRunning time50 minutesProduction companyO3 MedyaOriginal release…
Tajikistan Standard TimeTime zoneMap of South Asia with time zones; Tajikistan is colored green to indicate UTC+05:00.UTC offsetBSTUTC+05:00Current time01:05, 29 April 2024 BST [refresh]Observance of DSTDST is not observed in this time zone.Time in Tajikistan is given by Tajikistan Time (TJT; UTC+05:00). Tajikistan does not currently observe daylight saving time.[1] The IANA identifier for Tajikistan Time is Asia/Dushanbe. IANA time zone database Data for Tajikistan directly from zone.ta…
Environmental philosophy Greenhouse, Beeston, Leeds: a building professed by its developers to be 'eco-modernist'[1][2][3] Part of a series onGreen politics Core topics Climate change litigation Fossil fuels lobby Green politics Green party List of topics Politics of climate change Four pillars Ecological wisdom Social justice Grassroots democracy Nonviolence Perspectives Alter-globalization Bright green environmentalism Criticisms of globalization Deep ecology Degrowth D…