{
  "__type": "IngestedDoc",
  "__tag": 4010,
  "_content": {
    "Notes": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "The correlation coefficient is calculated as follows:"
            }
          ]
        },
        {
          "__type": "Math",
          "__tag": 4058,
          "value": "r = \\frac{\\sum (x - m_x) (y - m_y)}\n         {\\sqrt{\\sum (x - m_x)^2 \\sum (y - m_y)^2}}"
        },
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "where "
            },
            {
              "__type": "InlineMath",
              "__tag": 4057,
              "value": "m_x"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " is the mean of the vector x and "
            },
            {
              "__type": "InlineMath",
              "__tag": 4057,
              "value": "m_y"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " is the mean of the vector y."
            }
          ]
        },
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "Under the assumption that x and y are drawn from independent normal distributions (so the population correlation coefficient is 0), the probability density function of the sample correlation coefficient r is ("
            },
            {
              "__type": "FootnoteReference",
              "__tag": 4066,
              "label": "1"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": ", "
            },
            {
              "__type": "FootnoteReference",
              "__tag": 4066,
              "label": "2"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "):"
            }
          ]
        },
        {
          "__type": "Math",
          "__tag": 4058,
          "value": "f(r) = \\frac{{(1-r^2)}^{n/2-2}}{\\mathrm{B}(\\frac{1}{2},\\frac{n}{2}-1)}"
        },
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "where n is the number of samples, and B is the beta function.  This is sometimes referred to as the exact distribution of r.  This is the distribution that is used in "
            },
            {
              "__type": "InlineRole",
              "__tag": 4003,
              "value": "pearsonr",
              "domain": null,
              "role": null,
              "inventory": null
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " to compute the p-value. The distribution is a beta distribution on the interval [-1, 1], with equal shape parameters a = b = n/2 - 1.  In terms of SciPy's implementation of the beta distribution, the distribution of r is      "
            }
          ]
        },
        {
          "__type": "Code",
          "__tag": 4050,
          "value": "dist = scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2)",
          "execution_status": null
        },
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "The p-value returned by "
            },
            {
              "__type": "InlineRole",
              "__tag": 4003,
              "value": "pearsonr",
              "domain": null,
              "role": null,
              "inventory": null
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " is a two-sided p-value. The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. More precisely, for a given sample with correlation coefficient r, the p-value is the probability that abs(r') of a random sample x' and y' drawn from the population with zero correlation would be greater than or equal to abs(r). In terms of the object "
            },
            {
              "__type": "InlineCode",
              "__tag": 4051,
              "value": "dist"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " shown above, the p-value for a given r and length n can be computed as      "
            }
          ]
        },
        {
          "__type": "Code",
          "__tag": 4050,
          "value": "p = 2*dist.cdf(-abs(r))",
          "execution_status": null
        },
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "When n is 2, the above continuous distribution is not well-defined. One can interpret the limit of the beta distribution as the shape parameters a and b approach a = b = 0 as a discrete distribution with equal probability masses at r = 1 and r = -1.  More directly, one can observe that, given the data x = [x1, x2] and y = [y1, y2], and assuming x1 != x2 and y1 != y2, the only possible values for r are 1 and -1.  Because abs(r') for any sample x' and y' with length 2 will be 1, the two-sided p-value for a sample of length 2 is always 1."
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Warns": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Parameters",
          "__tag": 4026,
          "children": [
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "",
              "annotation": "`~scipy.stats.ConstantInputWarning`",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Raised if an input is a constant array.  The correlation coefficient is not defined in this case, so "
                    },
                    {
                      "__type": "InlineCode",
                      "__tag": 4051,
                      "value": "np.nan"
                    },
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": " is returned."
                    }
                  ]
                }
              ]
            },
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "",
              "annotation": "`~scipy.stats.NearConstantInputWarning`",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Raised if an input is \"nearly\" constant.  The array "
                    },
                    {
                      "__type": "InlineCode",
                      "__tag": 4051,
                      "value": "x"
                    },
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": " is considered nearly constant if "
                    },
                    {
                      "__type": "InlineCode",
                      "__tag": 4051,
                      "value": "norm(x - mean(x)) < 1e-13 * abs(mean(x))"
                    },
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": ". Numerical errors in the calculation "
                    },
                    {
                      "__type": "InlineCode",
                      "__tag": 4051,
                      "value": "x - mean(x)"
                    },
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": " in this case might result in an inaccurate calculation of r."
                    }
                  ]
                }
              ]
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Raises": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Yields": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Methods": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Returns": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Parameters",
          "__tag": 4026,
          "children": [
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "r",
              "annotation": "float",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Pearson's correlation coefficient."
                    }
                  ]
                }
              ]
            },
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "p-value",
              "annotation": "float",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Two-tailed p-value."
                    }
                  ]
                }
              ]
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Summary": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "Pearson correlation coefficient and p-value for testing non-correlation."
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Receives": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Warnings": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Attributes": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    },
    "Parameters": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Parameters",
          "__tag": 4026,
          "children": [
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "x",
              "annotation": "(N,) array_like",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Input array."
                    }
                  ]
                }
              ]
            },
            {
              "__type": "DocParam",
              "__tag": 4016,
              "name": "y",
              "annotation": "(N,) array_like",
              "desc": [
                {
                  "__type": "Paragraph",
                  "__tag": 4045,
                  "children": [
                    {
                      "__type": "Text",
                      "__tag": 4046,
                      "value": "Input array."
                    }
                  ]
                }
              ]
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Extended Summary": {
      "__type": "Section",
      "__tag": 4015,
      "children": [
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "The Pearson correlation coefficient "
            },
            {
              "__type": "FootnoteReference",
              "__tag": 4066,
              "label": "1"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " measures the linear relationship between two datasets.  The calculation of the p-value relies on the assumption that each dataset is normally distributed.  (See Kowalski "
            },
            {
              "__type": "FootnoteReference",
              "__tag": 4066,
              "label": "3"
            },
            {
              "__type": "Text",
              "__tag": 4046,
              "value": " for a discussion of the effects of non-normality of the input on the distribution of the correlation coefficient.)  Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship."
            }
          ]
        }
      ],
      "title": [],
      "level": 0,
      "target": null
    },
    "Other Parameters": {
      "__type": "Section",
      "__tag": 4015,
      "children": [],
      "title": [],
      "level": 0,
      "target": null
    }
  },
  "_ordered_sections": [
    "Summary",
    "Extended Summary",
    "Parameters",
    "Attributes",
    "Methods",
    "Returns",
    "Yields",
    "Receives",
    "Other Parameters",
    "Raises",
    "Warns",
    "Warnings",
    "Notes"
  ],
  "item_file": "/scipy/stats/_mstats_basic.py",
  "item_line": 403,
  "item_type": "function",
  "aliases": [
    "scipy.stats._mstats_basic.pearsonr"
  ],
  "example_section_data": {
    "__type": "Section",
    "__tag": 4015,
    "children": [
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "import numpy as np\nfrom scipy import stats\nfrom scipy.stats import mstats\n",
        "execution_status": "success"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "mstats.pearsonr([1, 2, 3, 4, 5], [10, 9, 2.5, 6, 4])\n",
        "execution_status": "failure"
      },
      {
        "__type": "Text",
        "__tag": 4046,
        "value": "\nThere is a linear dependence between x and y if y = a + b*x + e, where\na,b are constants and e is a random error term, assumed to be independent\nof x. For simplicity, assume that x is standard normal, a=0, b=1 and let\ne follow a normal distribution with mean zero and standard deviation s>0.\n\n"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "s = 0.5\nx = stats.norm.rvs(size=500)\ne = stats.norm.rvs(scale=s, size=500)\ny = x + e\n",
        "execution_status": "success"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "mstats.pearsonr(x, y)\n",
        "execution_status": "failure"
      },
      {
        "__type": "Text",
        "__tag": 4046,
        "value": "\nThis should be close to the exact value given by\n\n"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "1/np.sqrt(1 + s**2)\n",
        "execution_status": "failure"
      },
      {
        "__type": "Text",
        "__tag": 4046,
        "value": "\nFor s=0.5, we observe a high level of correlation. In general, a large\nvariance of the noise reduces the correlation, while the correlation\napproaches one as the variance of the error goes to zero.\n\nIt is important to keep in mind that no correlation does not imply\nindependence unless (x, y) is jointly normal. Correlation can even be zero\nwhen there is a very simple dependence structure: if X follows a\nstandard normal distribution, let y = abs(x). Note that the correlation\nbetween x and y is zero. Indeed, since the expectation of x is zero,\ncov(x, y) = E[x*y]. By definition, this equals E[x*abs(x)] which is zero\nby symmetry. The following lines of code illustrate this observation:\n\n"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "y = np.abs(x)\n",
        "execution_status": "success"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "mstats.pearsonr(x, y)\n",
        "execution_status": "failure"
      },
      {
        "__type": "Text",
        "__tag": 4046,
        "value": "\nA non-zero correlation coefficient can be misleading. For example, if X has\na standard normal distribution, define y = x if x < 0 and y = 0 otherwise.\nA simple calculation shows that corr(x, y) = sqrt(2/Pi) = 0.797...,\nimplying a high level of correlation:\n\n"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "y = np.where(x < 0, x, 0)\n",
        "execution_status": "success"
      },
      {
        "__type": "Code",
        "__tag": 4050,
        "value": "mstats.pearsonr(x, y)\n",
        "execution_status": "failure"
      },
      {
        "__type": "Text",
        "__tag": 4046,
        "value": "\nThis is unintuitive since there is no dependence of x and y if x is larger\nthan zero which happens in about half of the cases if we sample x and y."
      }
    ],
    "title": [],
    "level": 0,
    "target": null
  },
  "see_also": [
    {
      "__type": "SeeAlsoItem",
      "__tag": 4028,
      "name": {
        "__type": "CrossRef",
        "__tag": 4002,
        "value": "kendalltau",
        "reference": {
          "__type": "RefInfo",
          "__tag": 4000,
          "module": "current-module",
          "version": "current-version",
          "kind": "to-resolve",
          "path": "kendalltau"
        },
        "kind": "module"
      },
      "descriptions": [
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "Kendall's tau, a correlation measure for ordinal data."
            }
          ]
        }
      ],
      "type": null
    },
    {
      "__type": "SeeAlsoItem",
      "__tag": 4028,
      "name": {
        "__type": "CrossRef",
        "__tag": 4002,
        "value": "spearmanr",
        "reference": {
          "__type": "RefInfo",
          "__tag": 4000,
          "module": "current-module",
          "version": "current-version",
          "kind": "to-resolve",
          "path": "spearmanr"
        },
        "kind": "module"
      },
      "descriptions": [
        {
          "__type": "Paragraph",
          "__tag": 4045,
          "children": [
            {
              "__type": "Text",
              "__tag": 4046,
              "value": "Spearman rank-order correlation coefficient."
            }
          ]
        }
      ],
      "type": null
    }
  ],
  "signature": {
    "__type": "SignatureNode",
    "__tag": 4029,
    "kind": "function",
    "parameters": [
      {
        "__type": "SigParam",
        "__tag": 4030,
        "name": "x",
        "annotation": {
          "__type": "Empty",
          "__tag": 4031
        },
        "kind": "POSITIONAL_OR_KEYWORD",
        "default": {
          "__type": "Empty",
          "__tag": 4031
        }
      },
      {
        "__type": "SigParam",
        "__tag": 4030,
        "name": "y",
        "annotation": {
          "__type": "Empty",
          "__tag": 4031
        },
        "kind": "POSITIONAL_OR_KEYWORD",
        "default": {
          "__type": "Empty",
          "__tag": 4031
        }
      }
    ],
    "return_annotation": {
      "__type": "Empty",
      "__tag": 4031
    },
    "target_name": "pearsonr"
  },
  "references": [
    ".. [1] \"Pearson correlation coefficient\", Wikipedia,",
    "       https://en.wikipedia.org/wiki/Pearson_correlation_coefficient",
    ".. [2] Student, \"Probable error of a correlation coefficient\",",
    "       Biometrika, Volume 6, Issue 2-3, 1 September 1908, pp. 302-310.",
    ".. [3] C. J. Kowalski, \"On the Effects of Non-Normality on the Distribution",
    "       of the Sample Product-Moment Correlation Coefficient\"",
    "       Journal of the Royal Statistical Society. Series C (Applied",
    "       Statistics), Vol. 21, No. 1 (1972), pp. 1-12."
  ],
  "qa": "scipy.stats._mstats_basic:pearsonr",
  "arbitrary": [],
  "local_refs": [
    "p-value",
    "r",
    "x",
    "y"
  ]
}