benfords_law package

benfords_law.benfords_law module

Module contents

class benfords_law.BenfordsLaw(data: Union[list, numpy.array, pandas.core.series.Series])

Bases: object

Newcomb-Benford’s Law Analysis

Takes a list/array of numbers representing some real world dataset of numbers and analyzes to asses whether it fits the Newcomb-Benford’s Law (also known as the Law of Analogous Numbers). Fit is currently determined by either running a statistical goodness-of-fit test, or by running a visual test by plotting the actual distribution of first-significant digits in the dataset against the expected distribution according to Benford’s Law.

apply_benfords_law()

Runs all relevant processes and then applies all tests to input dataset

apply_chi_sq_test(alpha=0.05) → Tuple[float, float]

Apply Chi-Squared Goodness of fit test to test if the dataset’s first significant digit distribution meets the expectation of Benford’s Law. It passes the test if the p-value is greater than specified alpha and fails otherwise.

Parameters

alpha – Optional. Specifies the required significance level based on which the null hypothesis is rejected or failed to reject. Default = 0.05

Returns

Chi-Squared statistic, p-value

apply_visual_test(figsize: Tuple[int, int] = 15, 7)

Plot first significant digit distribution against the expectation of Benford’s Law

Parameters

figsize – Dimensions of the figure to plot in the format: (width, height)

get_counts() → Dict[str, int]

Get frequency of first significant digits passed in the dataset.

Returns

key pair value of each first significant digit and it’s respective frequency

get_distribution() → Dict[str, float]

Get percentage distribution of first significant digits passed in the dataset

Returns

key pair value of each first significant digit and it’s respective percentage

prepare_actual_distribution(get_fsd_counts: bool = False)