benfords_law package¶

benfords_law.benfords_law module¶

Module contents¶

class benfords_law.BenfordsLaw(data: Union[list, numpy.array, pandas.core.series.Series])¶

Bases: object

Newcomb-Benford’s Law Analysis

Takes a list/array of numbers representing some real world dataset of numbers and analyzes to asses whether it fits the Newcomb-Benford’s Law (also known as the Law of Analogous Numbers). Fit is currently determined by either running a statistical goodness-of-fit test, or by running a visual test by plotting the actual distribution of first-significant digits in the dataset against the expected distribution according to Benford’s Law.

apply_benfords_law()¶: Runs all relevant processes and then applies all tests to input dataset

apply_chi_sq_test(alpha=0.05) → Tuple[float, float]¶

Apply Chi-Squared Goodness of fit test to test if the dataset’s first significant digit distribution meets the expectation of Benford’s Law. It passes the test if the p-value is greater than specified alpha and fails otherwise.

Parameters: alpha – Optional. Specifies the required significance level based on which the null hypothesis is rejected or failed to reject. Default = 0.05
Returns: Chi-Squared statistic, p-value

apply_visual_test(figsize: Tuple[int, int] = 15, 7)¶

Plot first significant digit distribution against the expectation of Benford’s Law

Parameters: figsize – Dimensions of the figure to plot in the format: (width, height)

get_counts() → Dict[str, int]¶

Get frequency of first significant digits passed in the dataset.

Returns: key pair value of each first significant digit and it’s respective frequency

get_distribution() → Dict[str, float]¶

Get percentage distribution of first significant digits passed in the dataset

Returns: key pair value of each first significant digit and it’s respective percentage

prepare_actual_distribution(get_fsd_counts: bool = False)¶