util.text_similarity package¶
Subpackages¶
- util.text_similarity.lsh_min_hash package
- Submodules
- util.text_similarity.lsh_min_hash.lsh_min_hash module
- util.text_similarity.lsh_min_hash.lsh_min_hash_test module
- util.text_similarity.lsh_min_hash.shingles_generator module
- util.text_similarity.lsh_min_hash.time_analysis module
RandomTextsGeneratorRandomTextsGenerator.average_wordsRandomTextsGenerator.number_total_available_wordsRandomTextsGenerator.available_wordsRandomTextsGenerator.average_wordsRandomTextsGenerator.number_total_available_wordsRandomTextsGenerator.available_wordsRandomTextsGenerator.get_random_num_words()RandomTextsGenerator.generate_random_texts()RandomTextsGenerator.__init__()
MeasurementParamsMeasureLSHTimeComplexity
- Module contents
Submodules¶
util.text_similarity.max_independent_set_calc module¶
- class util.text_similarity.max_independent_set_calc.OptimalIndependentSetCalc[source]¶
Bases:
MaxIndependentSetCalc
- class util.text_similarity.max_independent_set_calc.ApproximateIndependentSetCalc[source]¶
Bases:
MaxIndependentSetCalc
- class util.text_similarity.max_independent_set_calc.GreedyIndependentSetCalc[source]¶
Bases:
MaxIndependentSetCalc- find_max_set(num_texts: int, similar_pairs: list[tuple[int, int]]) set[int][source]¶
Greedy algorithm to find an approximate maximum independent set by iteratively removing the node with the highest degree and its edges. :param num_texts: Number of vertices (texts) :param similar_pairs: List of edges representing pairs of similar texts :return: Set of vertices in the approximate maximum independent set
util.text_similarity.max_independent_set_calc_test module¶
- util.text_similarity.max_independent_set_calc_test.PAIRS_LIN_LOG_FUNC(x)¶
- class util.text_similarity.max_independent_set_calc_test.StaticCalcNumPairs(num_pairs)[source]¶
Bases:
CalcNumPairs
- class util.text_similarity.max_independent_set_calc_test.LogCalcNumPairs[source]¶
Bases:
CalcNumPairs
- class util.text_similarity.max_independent_set_calc_test.LinSquareRootCalcNumPairs[source]¶
Bases:
CalcNumPairs
- class util.text_similarity.max_independent_set_calc_test.RandomGraphGenerator(_calc_num_pairs: util.text_similarity.max_independent_set_calc_test.CalcNumPairs = <util.text_similarity.max_independent_set_calc_test.LogCalcNumPairs object at 0x000001DA595BF770>)[source]¶
Bases:
object- __init__(_calc_num_pairs: ~util.text_similarity.max_independent_set_calc_test.CalcNumPairs = <util.text_similarity.max_independent_set_calc_test.LogCalcNumPairs object>) None¶
- class util.text_similarity.max_independent_set_calc_test.MeasurementResult(num_texts: int, time: float, result_size: int)[source]¶
Bases:
object- num_texts: int¶
- time: float¶
- result_size: int¶
- __init__(num_texts: int, time: float, result_size: int) None¶
- class util.text_similarity.max_independent_set_calc_test.MeasureIndependentSetCalc(calc: util.text_similarity.max_independent_set_calc.MaxIndependentSetCalc, graph_generator: util.text_similarity.max_independent_set_calc_test.RandomGraphGenerator)[source]¶
Bases:
object- calc: MaxIndependentSetCalc¶
- graph_generator: RandomGraphGenerator¶
- analyze_calc_run(num_texts) MeasurementResult[source]¶
- analyze_calc_runs(num_texts, iterations) MeasurementResult[source]¶
- __init__(calc: MaxIndependentSetCalc, graph_generator: RandomGraphGenerator) None¶
- util.text_similarity.max_independent_set_calc_test.test_find_max_set(num_texts, similar_pairs, acceptable_solutions)[source]¶
- util.text_similarity.max_independent_set_calc_test.test_time_complexity(greedy_calc, dense_graph_gen)[source]¶
- class util.text_similarity.max_independent_set_calc_test.ScatterData(xs: <built-in function array>, ys: <built-in function array>, label: str)[source]¶
Bases:
object- xs: array¶
- ys: array¶
- label: str¶
- __init__(xs: array, ys: array, label: str) None¶
util.text_similarity.texts_similarity_filter module¶
- class util.text_similarity.texts_similarity_filter.TextsSimilarityFilter(lsh_min_hash: util.text_similarity.lsh_min_hash.lsh_min_hash.LSHMinHash, max_diff_set_calc: util.text_similarity.max_independent_set_calc.MaxIndependentSetCalc)[source]¶
Bases:
object- lsh_min_hash: LSHMinHash¶
- max_diff_set_calc: MaxIndependentSetCalc¶
- __init__(lsh_min_hash: LSHMinHash, max_diff_set_calc: MaxIndependentSetCalc) None¶