util package¶
Subpackages¶
- util.formatting package
- util.text_similarity package
- Subpackages
- Submodules
- util.text_similarity.max_independent_set_calc module
- util.text_similarity.max_independent_set_calc_test module
PAIRS_LIN_LOG_FUNC()CalcNumPairsStaticCalcNumPairsLogCalcNumPairsLinSquareRootCalcNumPairsRandomGraphGeneratorMeasurementResultMeasureIndependentSetCalcgreedy_calc()approx_calc()optimal_calc()random_graph_gen()dense_graph_gen()test_find_max_set()test_time_complexity()ScatterDataplot_algos()test_greedy_accuracy()
- util.text_similarity.texts_similarity_filter module
- util.text_similarity.texts_similarity_filter_test module
- Module contents
Submodules¶
util.csv_json_converter module¶
- util.csv_json_converter.json_to_csv_splitting_tags(json_file_path: Path, columns: list[str], output_file: Path, delete_duplicate_value_columns: list[str], tag_column_key: str = 'tags', tag_columns_prefix: str = 'tag_', order_by: str | None = None, shuffle=True, number_tags=5, max_size=10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)[source]¶
- util.csv_json_converter.json_to_csv(json_file_path: Path, csv_file_path: Path, shuffle=True, max_size=1000000)[source]¶
util.lin_reg_plot_helper module¶
util.merge_datasets module¶
- class util.merge_datasets.FileInformation(file_path: pathlib._local.Path, version: int)[source]¶
Bases:
object- file_path: Path¶
- version: int¶
- __init__(file_path: Path, version: int) None¶
- util.merge_datasets.merge_json_and_assign_uuid(files: list[FileInformation], output: Path)[source]¶
util.number_interval_generator module¶
- class util.number_interval_generator.NumberInterval(lower_bound: int, upper_bound: int)[source]¶
Bases:
objectRepresents a numeric interval with a lower and upper bound.
- lower_bound¶
The lower limit of the interval.
- Type:
int
- upper_bound¶
The upper limit of the interval.
- Type:
int
- lower_bound: int¶
- upper_bound: int¶
- static create_unbounded_interval()[source]¶
Creates a NumberInterval with no bounds, spanning from negative to positive infinity.
- static create_positive_unbounded_interval()[source]¶
Creates a NumberInterval, spanning from 0 to positive infinity.
- property range: int¶
Returns the range of the interval.
- __init__(lower_bound: int, upper_bound: int) None¶
- class util.number_interval_generator.NormalizedNumberGenerator(*, mean: float, number_bounds: NumberInterval = NumberInterval(lower_bound=-10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, upper_bound=10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000), standard_deviation: Annotated[float, Gt(gt=0)])[source]¶
Bases:
BaseModelGenerates random numbers based on a normal distribution, constrained by a numeric interval.
- mean¶
The mean of the normal distribution.
- Type:
float
- number_bounds¶
The bounds within which generated numbers must fall.
- Type:
- standard_deviation¶
The standard deviation of the normal distribution.
- Type:
float
- mean: float¶
- number_bounds: NumberInterval¶
- standard_deviation: float¶
- model_post_init(context: Any, /) None¶
We need to both initialize private attributes and call the user-defined model_post_init method.
- generate_bounded_number() int[source]¶
Generates a random number that falls within the specified bounds.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'mean': FieldInfo(annotation=float, required=True), 'number_bounds': FieldInfo(annotation=NumberInterval, required=False, default=NumberInterval(lower_bound=-10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, upper_bound=10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)), 'standard_deviation': FieldInfo(annotation=float, required=True, metadata=[Gt(gt=0)])}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class util.number_interval_generator.NumberIntervalGenerator(*, mean: float, standard_deviation: Annotated[float, Gt(gt=0)], upper_bound_difference_log_factor: Annotated[float, Ge(ge=1)] = 5, min_upper_bound_log_base: Annotated[float, Gt(gt=1)] = 2, lower_number_bounds: NumberInterval = NumberInterval(lower_bound=0, upper_bound=10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000), lower_number_generator: NormalizedNumberGenerator = None)[source]¶
Bases:
BaseModelGenerates a random numeric interval based on a normal distribution. difference between lower and upper bound is calculated on a logarithmic scale so that larger values get higher difference: when log_base = 2 and factor=5 text_length=8 -> log(8, 2) * 5 = 3 * 5 = 15 text_length=256 -> log(256, 2) * 5 = 8 * 5 = 40 .. attribute:: mean
The mean of the normal distribution.
- type:
float
- standard_deviation¶
The standard deviation of the normal distribution.
- Type:
float
- upper_bound_difference_log_factor¶
Factor used to determine the upper bound relative to the lower bound.
- Type:
float
- min_upper_bound_log_base¶
Base for the logarithmic calculation of the upper bound.
- Type:
float
- lower_number_bounds¶
Bounds for generating the lower number.
- Type:
- lower_number_generator¶
Generator for the lower bound value.
- mean: float¶
- standard_deviation: float¶
- upper_bound_difference_log_factor: float¶
- min_upper_bound_log_base: float¶
- lower_number_bounds: NumberInterval¶
- lower_number_generator: NormalizedNumberGenerator¶
- model_post_init(_NumberIntervalGenerator__context: any) None[source]¶
Initializes the lower number generator if it is not provided.
- generate_interval() NumberInterval[source]¶
Generates a random numeric interval consisting of a lower and upper bound.
- Returns:
A randomly generated numeric interval.
- Return type:
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'lower_number_bounds': FieldInfo(annotation=NumberInterval, required=False, default=NumberInterval(lower_bound=0, upper_bound=10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)), 'lower_number_generator': FieldInfo(annotation=NormalizedNumberGenerator, required=False, default=None), 'mean': FieldInfo(annotation=float, required=True), 'min_upper_bound_log_base': FieldInfo(annotation=float, required=False, default=2, metadata=[Gt(gt=1)]), 'standard_deviation': FieldInfo(annotation=float, required=True, metadata=[Gt(gt=0)]), 'upper_bound_difference_log_factor': FieldInfo(annotation=float, required=False, default=5, metadata=[Ge(ge=1)])}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
util.number_interval_generator_test module¶
- util.number_interval_generator_test.get_random_intervals(mean: int, standard_deviation: int, _lower_number_min_value: int = 0) list[NumberInterval][source]¶
- util.number_interval_generator_test.get_random_numbers(mean: int, standard_deviation: int, _lower_number_min_value: int = 0) list[int][source]¶
- util.number_interval_generator_test.test_random_interval_distribution(text_length_mean, text_length_standard_deviation)[source]¶