strategies package#

Submodules#

strategies.AttributeExtractionStrategies module#

class academic_metrics.strategies.AttributeExtractionStrategies.AttributeExtractionStrategy(warning_manager, missing_abstracts_file='missing_abstracts.txt')[source]#

Bases: ABC

Abstract base class for attribute extraction strategies.

This class provides a framework for extracting various attributes from academic publication data. It defines common methods and properties that all specific extraction strategies should implement or utilize.

It implements the Strategy pattern, allowing for flexible implementation of different extraction methods for different types of attributes or data sources. See more on the Strategy pattern here: https://en.wikipedia.org/wiki/Strategy_pattern

logger#

Logger for recording extraction-related events.

Type:: logging.Logger

abstract_pattern#

Regular expression pattern for extracting abstracts.

Type:: re.Pattern

missing_abstracts_file#

File path for storing information about missing abstracts.

Type:: str

warning_manager#

Manages and logs warnings during extraction.

Type:: WarningManager

unknown_authors_dict#

Stores information about unidentified authors.

Type:: dict

unknown_authors_file#

File path for storing information about unknown authors.

Type:: str

crossref_author_key#

Key used to access author information in Crossref data.

Type:: str

extract_attribute()[source]#: Abstract method to be implemented by subclasses for specific attribute extraction.

html_to_markdown()[source]#: Converts HTML content to Markdown format.

get_crossref_author_affils()[source]#: Retrieves author affiliations from Crossref data.

get_author_obj()[source]#: Extracts author information from Crossref JSON data.

set_author_sequence_dict()[source]#: Organizes author information into a structured dictionary.

write_missing_authors_file()[source]#: Writes information about unknown authors to a file.

create_author_sequence_dict()[source]#: Creates a template dictionary for author sequence information.

create_unknown_authors_dict()[source]#: Creates a template dictionary for unknown authors.

log_extraction_warning()[source]#: Logs warnings encountered during attribute extraction.

generate_error_id()[source]#: Generates a unique identifier for error tracking.

get_authors_as_list()[source]#: Converts the author sequence dictionary to a list of author names.

Design:: This class is designed as an abstract base class, providing a common interface and shared functionality for various attribute extraction strategies. It uses the Strategy pattern, allowing for flexible implementation of different extraction methods for different types of attributes or data sources.
Summary:: The AttributeExtractionStrategy class serves as a foundation for creating specific attribute extraction strategies in an academic publication data processing system. It provides utility methods for handling common tasks such as HTML parsing, author information processing, and error logging, while defining an interface for implementing specific extraction logic in subclasses.

__init__(warning_manager, missing_abstracts_file='missing_abstracts.txt')[source]#

Initializes the AttributeExtractionStrategy with necessary components and configurations.

This constructor sets up the basic infrastructure needed for attribute extraction, including logging, file paths for storing missing data, and utilities for managing warnings and unknown author information.

Parameters:

warning_manager (WarningManager) – An instance of WarningManager for handling and logging warnings encountered during the extraction process.
missing_abstracts_file (str, optional) – The file path where information about missing abstracts will be stored. Defaults to “missing_abstracts.txt”.

Returns:

None

Design:: The constructor initializes various attributes of the class, setting up the logger, compiling regular expressions, and preparing data structures for handling unknown authors and missing abstracts. It’s designed to provide a consistent starting point for all subclasses of AttributeExtractionStrategy.
Summary:: This method prepares an instance of AttributeExtractionStrategy (or its subclass) for operation by setting up necessary tools and configurations for attribute extraction. It ensures that each strategy has access to logging, warning management, and file storage for handling edge cases and errors in the extraction process.

abstract extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (str) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

Any

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

html_to_markdown(html_content)[source]#

Converts HTML content to Markdown format.

This method takes HTML content, particularly JATS XML, and converts it to a simplified Markdown format.

Parameters:: html_content (str) – The HTML content to be converted.
Returns:: The converted content in Markdown format.
Return type:: str

Design:: Uses BeautifulSoup for parsing HTML and custom logic to generate Markdown. Handles both sectioned and non-sectioned content.
Summary:: Transforms complex HTML/XML content into more readable Markdown format.

get_crossref_author_affils(author_item)[source]#

Retrieves author affiliations from Crossref data.

This method extracts the affiliation information for an author from the Crossref author data structure.

It is designed to be used with Crossref data, not WoS data. It is NOT compatible with WoS data.

This is implemented in the base class as it is used by more than a single subclass such as CrossrefAuthorExtractionStrategy and CrossrefDepartmentExtractionStrategy.

Parameters:: author_item (dict) – A dictionary containing author information from Crossref.
Returns:: A list of affiliation names for the author.
Return type:: list[str]

Design:: Directly accesses the ‘affiliation’ key in the author dictionary. Extracts the ‘name’ field from each affiliation entry.
Summary:: Extracts and returns a list of affiliation names for a given author.

get_author_obj(*, crossref_json)[source]#

Extracts the author object from Crossref JSON data.

This method retrieves the author information from the Crossref JSON structure. It is designed to be used with Crossref data, not WoS data.

Parameters:: crossref_json (dict) – The Crossref JSON data containing author information.
Returns:: A list of dictionaries, each containing information about an author.
Return type:: list[dict]

Design:: Uses the class attribute ‘crossref_author_key’ to access author information.
Summary:: Extracts and returns the author object from Crossref JSON data.

create_author_sequence_dict()[source]#

Creates a template dictionary for author sequence information.

This method initializes a dictionary structure to store information about the first author and additional authors.

Returns:: A dictionary with keys for ‘first’ author and ‘additional’ authors.
Return type:: dict

Design:: Provides a consistent structure for storing author information.
Summary:: Creates and returns a template dictionary for organizing author information.

create_unknown_authors_dict()[source]#

Creates a template dictionary for unknown authors.

This method initializes a dictionary to store information about authors that couldn’t be properly processed.

Returns:: A dictionary with a key for ‘unknown_authors’.
Return type:: dict

Design:: Provides a consistent structure for storing information about problematic authors.
Summary:: Creates and returns a template dictionary for tracking unknown authors.

log_extraction_warning(attribute_class_name, warning_message, entry_id=None, line_prefix=None)[source]#

Logs warnings encountered during attribute extraction.

This method creates a standardized log message for extraction warnings and can optionally include specific entry information.

Parameters:

attribute_class_name (str) – The name of the attribute class where the warning occurred.
warning_message (str) – The specific warning message.
entry_id (str, optional) – An identifier for the entry causing the warning.
line_prefix (str, optional) – A prefix to identify specific lines in the entry.

Returns:

None

Design:: Generates a unique error ID for each warning. Commented-out code shows potential for more detailed logging.
Summary:: Creates and logs a standardized warning message for attribute extraction issues.

generate_error_id()[source]#

Generates a unique identifier for error tracking.

This method creates a UUID to uniquely identify each error or warning instance.

Returns:: A unique UUID string.
Return type:: str

Design:: Uses Python’s uuid module to generate a version 4 UUID.
Summary:: Generates and returns a unique identifier string for error tracking purposes.

write_missing_authors_file(unknown_authors, unknown_authors_file)[source]#

Converts the author sequence dictionary to a list of author names.

This method extracts author names from the structured author sequence dictionary and returns them as a simple list.

Parameters:: author_sequence_dict (dict) – A dictionary containing structured author information.
Returns:: A list of author names in the order they appear in the publication.
Return type:: list[str]

Design:: Handles both the first author and additional authors from the dictionary.
Summary:: Extracts and returns a list of author names from the structured author dictionary.

set_author_sequence_dict(*, author_items, author_sequence_dict)[source]#

Organizes author information into a structured dictionary.

This method processes a list of author items and organizes them into a dictionary based on their sequence (first author or additional authors). It is designed to work with Crossref data, not WoS data.

Parameters:

author_items (list[dict]) – A list of dictionaries containing author information.
author_sequence_dict (dict) – A dictionary to store the organized author information.

Return type:

None

Returns:

None

Design:: Processes each author item, extracting name and affiliation information. Handles cases for first author and additional authors separately. Logs warnings for missing or incomplete author information.
Summary:: Organizes author information into a structured dictionary format.

get_authors_as_list(*, author_sequence_dict)[source]#

Converts the author sequence dictionary to a list of author names.

This method extracts author names from the structured author sequence dictionary and returns them as a simple list.

Parameters:: author_sequence_dict (dict) – A dictionary containing structured author information.
Returns:: A list of author names in the order they appear in the publication.
Return type:: list[str]

Design:: Handles both the first author and additional authors from the dictionary.
Summary:: Extracts and returns a list of author names from the structured author dictionary.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefTitleExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting title information from Crossref data.

This class implements the AttributeExtractionStrategy for title extraction specifically from Crossref JSON data. It focuses on extracting and cleaning the title(s) of a publication.

title_key#

The key used to access title information in the Crossref JSON.

Type:: str

clean_title()[source]#: Removes HTML tags from a title string.

extract_attribute()[source]#: Extracts and cleans the title(s) from the Crossref entry.

Design:: Utilizes BeautifulSoup for HTML tag removal and handles potential multiple titles. Implements the Strategy pattern for title extraction from Crossref data.
Summary:: Provides a specialized strategy for extracting and cleaning publication titles from Crossref data entries, handling potential HTML content and multiple titles.

__init__(warning_manager)[source]#

Initializes the CrossrefTitleExtractionStrategy.

This constructor sets up the strategy with a warning manager and defines the key for accessing title information in Crossref data.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor and sets up the title key for Crossref data.
Summary:: Prepares the strategy instance for title extraction from Crossref data.

clean_title(title)[source]#

Removes HTML tags from a title string using BeautifulSoup.

This method uses BeautifulSoup to parse and remove any HTML tags present in the title string.

Parameters:: title (str) – The title string potentially containing HTML tags.
Returns:: The cleaned title string with HTML tags removed.
Return type:: str

Design:: Uses BeautifulSoup with ‘html.parser’ to safely remove HTML tags.
Summary:: Cleans a title string by removing any HTML tags it may contain.

extract_attribute(entry_text)[source]#

Extracts and cleans the title(s) from the Crossref entry.

This method retrieves the title(s) from the Crossref JSON data, handles potential multiple titles, and cleans each title by removing HTML tags.

Parameters:

entry_text (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean indicating success (True) or failure (False) of the extraction.
A list of cleaned title strings, or None if no titles were found.

Return type:

tuple[bool, list[str]]

Design:: Retrieves titles using the predefined title_key. Handles cases where a single title or multiple titles may be present. Cleans each title using the clean_title method. Logs a warning if no titles are found.
Summary:: Extracts, cleans, and returns the publication title(s) from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefAbstractExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting abstract information from Crossref data.

This class implements the AttributeExtractionStrategy for abstract extraction specifically from Crossref JSON data. It focuses on extracting and cleaning the abstract of a publication.

abstract_key#

The key used to access abstract information in the Crossref JSON.

Type:: str

clean_abstract()[source]#: Converts HTML content in the abstract to Markdown format.

extract_attribute()[source]#: Extracts and cleans the abstract from the Crossref entry.

Design:: Utilizes the html_to_markdown method for cleaning HTML content in abstracts. Implements the Strategy pattern for abstract extraction from Crossref data.
Summary:: Provides a specialized strategy for extracting and cleaning publication abstracts from Crossref data entries, handling potential HTML content.

__init__(warning_manager)[source]#

Initializes the CrossrefAbstractExtractionStrategy.

This constructor sets up the strategy with a warning manager and defines the key for accessing abstract information in Crossref data.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor and sets up the abstract key for Crossref data.
Summary:: Prepares the strategy instance for abstract extraction from Crossref data.

clean_abstract(abstract)[source]#

Cleans the abstract by converting HTML content to Markdown format.

This method uses the html_to_markdown method to convert any HTML content in the abstract to a more readable Markdown format.

Parameters:: abstract (str) – The abstract string potentially containing HTML content.
Returns:: The cleaned abstract string in Markdown format.
Return type:: str

Design:: Utilizes the html_to_markdown method inherited from the parent class.
Summary:: Converts HTML content in the abstract to Markdown for improved readability.

extract_attribute(entry_text)[source]#

Extracts and cleans the abstract from the Crossref entry.

This method retrieves the abstract from the Crossref JSON data, cleans it by converting HTML to Markdown, and returns the result.

Parameters:

entry_text (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean indicating success (True) or failure (False) of the extraction.
The cleaned abstract string, or None if no abstract was found.

Return type:

tuple[bool, str]

Design:: Retrieves the abstract using the predefined abstract_key. Cleans the abstract using the clean_abstract method. Logs a warning if no abstract is found.
Summary:: Extracts, cleans, and returns the publication abstract from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefAuthorExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting author information from Crossref data.

This class implements the AttributeExtractionStrategy for author extraction specifically from Crossref JSON data. It focuses on extracting and organizing author names and handling cases where author information might be incomplete.

unknown_authors#

A dictionary to store information about authors with incomplete data.

Type:: dict

missing_authors_file#

The file path to store information about unknown authors.

Type:: str

get_author_name()[source]#: Constructs a full author name from given and family name components.

extract_attribute()[source]#: Extracts and organizes author information from the Crossref entry.

Design:: Utilizes helper methods to process individual author items and organize them. Implements the Strategy pattern for author extraction from Crossref data.
Summary:: Provides a specialized strategy for extracting and organizing author information from Crossref data entries, handling potential incomplete author data.

__init__(warning_manager)[source]#

Initializes the CrossrefAuthorExtractionStrategy.

This constructor sets up the strategy with a warning manager and initializes structures for handling unknown authors.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor and sets up data structures for unknown authors.
Summary:: Prepares the strategy instance for author extraction from Crossref data.

get_author_name(author_item)[source]#

Constructs a full author name from given and family name components.

This method attempts to create a full author name from the given and family name fields in the author item. If either component is missing, it logs a warning.

Parameters:: author_item (dict) – A dictionary containing author information from Crossref.
Returns:: The full author name if both components are present, None otherwise.
Return type:: str

Design:: Extracts given and family names from the author item. Logs a warning if either component is missing.
Summary:: Constructs and returns a full author name, or None if information is incomplete.

extract_attribute(crossref_json)[source]#

Extracts and organizes author information from the Crossref entry.

This method processes the Crossref JSON data to extract author information, organizes it into a structured format, and returns a list of author names.

Parameters:

crossref_json (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean indicating success (True) or failure (False) of the extraction.
A list of author names extracted from the Crossref data.

Return type:

tuple[bool, list[str]]

Design:: Uses helper methods to extract author objects and organize them into a sequence. Converts the organized author data into a simple list of names.
Summary:: Extracts, organizes, and returns a list of author names from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefDepartmentExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting department information from Crossref data.

This class implements the AttributeExtractionStrategy for department extraction specifically from Crossref JSON data. It focuses on extracting and organizing department affiliations for each author in a publication.

extract_attribute()[source]#: Extracts and organizes department affiliations from the Crossref entry.

Design:: Utilizes helper methods to process author information and extract department affiliations. Implements the Strategy pattern for department extraction from Crossref data.
Summary:: Provides a specialized strategy for extracting and organizing department affiliations for authors from Crossref data entries.

__init__(warning_manager)[source]#

Initializes the CrossrefDepartmentExtractionStrategy.

This constructor sets up the strategy with a warning manager.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor to set up the warning manager.
Summary:: Prepares the strategy instance for department extraction from Crossref data.

extract_attribute(crossref_json)[source]#

Extracts and organizes department affiliations from the Crossref entry.

This method processes the Crossref JSON data to extract department affiliations for each author, organizing them into a dictionary structure.

Parameters:

crossref_json (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean indicating success (True) or failure (False) of the extraction.
A dictionary where keys are author names and values are lists of their affiliations.

Return type:

tuple[bool, dict[str, list[str]]]

Design:: Uses helper methods to extract author objects and organize them into a sequence. Processes both the first author and additional authors separately. Creates a dictionary mapping author names to their department affiliations. Logs a warning if no department affiliations are found.
Summary:: Extracts and returns a dictionary of author names mapped to their department affiliations from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefCategoriesExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting category information from Crossref data.

This class implements the AttributeExtractionStrategy for category extraction specifically from Crossref JSON data. It focuses on retrieving the categories associated with a publication.

extract_attribute()[source]#: Extracts the categories from the Crossref entry.

Design:: Implements the Strategy pattern for category extraction from Crossref data. Uses a simple dictionary lookup to retrieve category information.
Summary:: Provides a specialized strategy for extracting publication categories from Crossref data entries.

__init__(warning_manager)[source]#

Initializes the CrossrefCategoriesExtractionStrategy.

This constructor sets up the strategy with a warning manager.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor to set up the warning manager.
Summary:: Prepares the strategy instance for category extraction from Crossref data.

extract_attribute(crossref_json)[source]#

Extracts the categories from the Crossref entry.

This method retrieves the categories associated with a publication from the Crossref JSON data.

Parameters:

crossref_json (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean indicating success (True) if categories are found, False otherwise.
A list of category strings, or None if no categories are found.

Return type:

tuple[bool, list[str]]

Design:: Uses a simple dictionary get method to retrieve the categories. Returns True only if categories are present.
Summary:: Extracts and returns the categories associated with a publication from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefCitationCountExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

A strategy for extracting citation count information from Crossref data.

This class implements the AttributeExtractionStrategy for citation count extraction specifically from Crossref JSON data. It focuses on retrieving the number of times a publication has been cited.

extract_attribute()[source]#: Extracts the citation count from the Crossref entry.

Design:: Implements the Strategy pattern for citation count extraction from Crossref data. Uses a simple dictionary lookup to retrieve citation count information.
Summary:: Provides a specialized strategy for extracting publication citation counts from Crossref data entries.

__init__(warning_manager)[source]#

Initializes the CrossrefCitationCountExtractionStrategy.

This constructor sets up the strategy with a warning manager.

Parameters:: warning_manager (WarningManager) – An instance of WarningManager for handling extraction warnings.
Returns:: None

Design:: Calls the superclass constructor to set up the warning manager.
Summary:: Prepares the strategy instance for citation count extraction from Crossref data.

extract_attribute(crossref_json)[source]#

Extracts the citation count from the Crossref entry.

This method retrieves the number of times a publication has been cited from the Crossref JSON data.

Parameters:

crossref_json (dict) – The Crossref JSON data containing the publication information.

Returns:

A tuple containing:

A boolean always set to True (as the method always returns a count, even if it’s 0).
An integer representing the citation count.

Return type:

tuple[bool, int]

Design:: Uses a dictionary get method to retrieve the citation count, defaulting to 0 if not found. Always returns True as the first element of the tuple, as a count is always available.
Summary:: Extracts and returns the citation count for a publication from Crossref JSON data.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefLicenseURLExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefPublishedPrintExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefCreatedDateExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefPublishedOnlineExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefJournalExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefURLExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefDOIExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefThemesExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, list[str]]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

class academic_metrics.strategies.AttributeExtractionStrategies.CrossrefExtraContextExtractionStrategy(warning_manager)[source]#

Bases: AttributeExtractionStrategy

extract_attribute(entry_text)[source]#

Abstract method to extract a specific attribute from the entry text.

This method should be implemented by subclasses to define the specific extraction logic for each attribute type.

Parameters:

entry_text (dict) – The text or data from which to extract the attribute.
data. (entry_text applies to both WoS and Crossref)

Return type:

tuple[bool, str]

Returns:

The extracted attribute value.

Raises:

NotImplementedError – If the method is not implemented in a subclass.

Design:: This abstract method enforces a common interface for all attribute extraction strategies, allowing for polymorphic use of different strategies.
Summary:: Defines the contract for attribute extraction methods in subclasses.

Previous topic

Next topic

Table of Contents

This Page

strategies package#

Submodules#

strategies.AttributeExtractionStrategies module#

Module contents#

This Page