Check Documentation¶

chalicelib.checks.audit_checks.biosource_cell_line_value(connection, **kwargs)¶: checks cell line biosources to make sure they have an associated ontology term

chalicelib.checks.audit_checks.check_bio_feature_organism_name(connection, **kwargs)¶: Attempts to identify an organism to add to the organism_name field in BioFeature items checks the linked genes or the genomic regions and then description

chalicelib.checks.audit_checks.check_fastq_read_id(connection, **kwargs)¶: Reports if there are uploaded fastq files with integer read ids

chalicelib.checks.audit_checks.check_opf_status_mismatch(connection, **kwargs)¶: Check to make sure that collections of other_processed_files don’t have status mismatches. Specifically, checks that (1) all files in an other_processed_files collection have the same status; and (2) the status of the experiment set is on the same status level or higher than the status of files in the other_processed_files collection (e.g., if the other_processed_files were released when the experiment set is in review by lab.)

chalicelib.checks.audit_checks.check_search_urls(connection, **kwargs)¶: Check the URLs in static sections that link to a search or browse page. Give a warning if the number of results is 0.

chalicelib.checks.audit_checks.check_validation_errors(connection, **kwargs)¶: Counts number of items in fourfront with schema validation errors, returns link to search if found.

chalicelib.checks.audit_checks.expset_opf_unique_files(connection, **kwargs)¶: Checks Experiments and Experiment Sets with other_processed_files and looks if any opf is also present within the raw, processed or reference files.

chalicelib.checks.audit_checks.expset_opf_unique_files_in_experiments(connection, **kwargs)¶: checks experiment sets with other_processed_files and looks for other_processed_files collections in child experiments to make sure that (1) the collections have titles and (2) that if the titles are shared with the parent experiment set, that the filenames contained within are unique

chalicelib.checks.audit_checks.expset_opfsets_unique_titles(connection, **kwargs)¶: checks experiment sets with other_processed_files to see if each collection of other_processed_files has a unique title within that experiment set

chalicelib.checks.audit_checks.external_expsets_without_pub(connection, **kwargs)¶: checks external experiment sets to see if they are attributed to a publication

chalicelib.checks.audit_checks.external_submission_but_missing_dbxrefs(connection, **kwargs)¶: Check if items with external_submission also have dbxrefs. When exporting metadata for submission to an external repository, external_submission is patched. After some time (delay), the corresponding dbxref should also have been received and patched.

chalicelib.checks.audit_checks.paired_end_info_consistent(connection, **kwargs)¶: Check that fastqs with a paired_end number have a paired_with related_file, and vice versa

chalicelib.checks.audit_checks.released_hela_files(connection, **kwargs)¶: Check if fastq or bam files from HeLa cells have a visible status.

chalicelib.checks.audit_checks.released_output_from_restricted_input(connection, **kwargs)¶: Check if fastq or bam files produced by workflows with restricted input files (typically because deriving from HeLa cells) have a visible status. In addition, check if any fastq or bam processed file (with visible status) is not output of a workflow (‘unlinked’). If this happens, the check cannot ensure that all processed files are analyzed.

chalicelib.checks.audit_checks.restrict_hela(connection, **kwargs)¶: Patch the status of visible HeLa files to “restricted”

chalicelib.checks.badge_checks.compare_badges(obj_ids, item_type, badge, ff_keys)¶: Compares items that should have a given badge to items that do have the given badge. Used for badges that utilize a single message choice. Input (first argument) should be a list of item @ids.

chalicelib.checks.badge_checks.compare_badges_and_messages(obj_id_dict, item_type, badge, ff_keys)¶: Compares items that should have a given badge to items that do have the given badge. Also compares badge messages to see if the message is the right one or needs to be updated. Input (first argument) should be a dictionary of item’s @id and the badge message it should have.

chalicelib.checks.badge_checks.consistent_replicate_info(connection, **kwargs)¶

Check for replicate experiment sets that have discrepancies in metadata between replicate experiments.

Action patches badges with a message detailing which fields have the inconsistencies and what the inconsistent values are.

chalicelib.checks.badge_checks.exp_has_raw_files(connection, **kwargs)¶: Check for sequencing experiments that don’t have raw files Action patches badges

chalicelib.checks.badge_checks.gold_biosamples(connection, **kwargs)¶: Gold level commendation criteria: 1. Tier 1 or Tier 2 Cells obtained from the approved 4DN source and grown precisely according to the approved SOP including any additional authentication (eg. HAP-1 haploid line requires ploidy authentication). 2. All required metadata present (does not have a biosample warning badge).

chalicelib.checks.badge_checks.patch_badges(full_output, badge_name, ff_keys, single_message='')¶: General function for patching badges. For badges with single message choice: - single_message kwarg should be assigned a string to be used for the badge message; - full_output[output_keys[0]] should be a list of item @ids; - no badges are edited, they are only added or removed. For badges with multiple message options: - single_message kwarg should not be used, but left as empty string. - full_output[output_keys[0]] should be a list of item @ids and message to patch into badge. - badges can also be edited to change the message.

chalicelib.checks.badge_checks.repsets_have_bio_reps(connection, **kwargs)¶

Check for replicate experiment sets that have one of the following issues: 1) Only a single biological replicate (includes sets with single experiment) 2) Biological replicate numbers that are not in sequence 3) Technical replicate numbers that are not in sequence

Action patches badges with a message detailing which of the above issues is relevant.

chalicelib.checks.badge_checks.yellow_flag_biosamples(connection, **kwargs)¶: Checks biosamples for required metadata: 1. Culture harvest date, doubling number, passage number, culture duration 2. Morphology image 3. Karyotyping (authentication doc or string field) for any biosample derived from pluripotent cell line that has been passaged more than 10 times beyond the first thaw of the original vial. 4. Differentiation authentication for differentiated cells. 5. HAP-1 biosamples must have ploidy authentication.

chalicelib.checks.es_checks.clean_s3_es_checks(connection, **kwargs)¶: Cleans old checks from both s3 and es older than one month. Must be called from a specific check as it will take too long otherwise.

chalicelib.checks.es_checks.elasticsearch_s3_count_diff(connection, **kwargs)¶: Reports the difference between the number of files on s3 and es

chalicelib.checks.es_checks.migrate_checks_to_es(connection, **kwargs)¶: Migrates checks from s3 to es. If a check name is given only those checks will be migrated

chalicelib.checks.header_checks.find_items_for_header_processing(connection, check, header, add_search=None, remove_search=None, append=True)¶: (add_search) and remove them from others (remove_search). Args are: - connection (FS connection) - check (required; check object initialized by CheckResult) - headers @id (required) - add_search search query - remove_search search query Meant to be used for CHECKS

chalicelib.checks.header_checks.patch_items_with_headers(connection, action, kwargs)¶: Arguments are: - the connection (FS connection) - the action (from ActionResult) - kwargs (from the action function) Takes care of patching info on Fourfront and also populating fields on the action

chalicelib.checks.higlass_checks.add_viewconf_static_content_to_file(connection, item_uuid, higlass_item_uuid, static_content_section, sc_location)¶

Add some static content for the item that shows the view config created for it. Returns True upon success.

Args:: connection : The connection to Fourfront. item_uuid(str) : Identifier for the item. higlass_item_uuid(str) : Identifier for the Higlass Item. static_content_section(list): The current static content section for this item. sc_location(str) : Name for the new Static Content’s location field.
Returns:: boolean. True indicates success. string. Contains the error (or an empty string if there is no error.)

chalicelib.checks.higlass_checks.check_expsets_otherprocessedfiles_for_new_higlass_items(connection, **kwargs)¶

Search for Higlass Items from Experiment Set Other Processed Files (aka Supplementary Files) that need to be updated.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

minutes_leeway(integer, optional, default=10): Number of minutes after the action completed to compare against.
Returns:: check result object.

chalicelib.checks.higlass_checks.check_expsets_otherprocessedfiles_for_queried_files(connection, **kwargs)¶

Search for Higlass Items from Experiment Set Other Processed Files (aka Supplementary Files) that match the given query.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

search_queries(list, optional, default=[]): A list of search queries. All Files found in at least one of the queries will be modified. minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.
Returns:: check result object.

chalicelib.checks.higlass_checks.check_expsets_processedfiles_for_modified_higlass_items(connection, **kwargs)¶

Search for Higlass Items from Experiment Set Processed Files that need to be updated. ExpSets are chosen based on the search queries.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

minutes_leeway(integer, optional, default=10): Number of minutes after the action completed to compare against.
Returns:: check result object.

chalicelib.checks.higlass_checks.check_expsets_processedfiles_for_new_higlass_items(connection, **kwargs)¶

Search for Higlass Items from Experiment Set Processed Files that need to be updated. ExpSets are chosen based on the search queries.

Args:: connection: The connection to Fourfront. **kwargs, which may include:
Returns:: check result object.

chalicelib.checks.higlass_checks.check_expsets_processedfiles_for_queried_higlass_items(connection, **kwargs)¶

Search for Higlass Items from Experiment Set Processed Files that need to be updated. ExpSets are chosen based on the search queries.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

search_queries(list, optional, default=[]): A list of search queries. All ExpSets found in at least one of the queries will be modified. minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.
Returns:: check result object.

chalicelib.checks.higlass_checks.check_higlass_items_for_modified_files(connection, **kwargs)¶

Find files modified since the last time the action completed.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

minutes_leeway(integer, optional, default=10): Number of minutes after the action completed to consider the file modified.
Returns:: check results object.

chalicelib.checks.higlass_checks.check_higlass_items_for_new_files(connection, **kwargs)¶

Find files without Higlass Items.

Args:: connection: The connection to Fourfront. **kwargs
Returns:: check results object.

chalicelib.checks.higlass_checks.check_higlass_items_for_queried_files(connection, **kwargs)¶

Create or Update HiGlass Items for files found in the given query.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

search_queries(list, optional, default=[]): A list of search queries. All Files found in at least one of the queries will be modified. minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.
Returns:: check results object.

chalicelib.checks.higlass_checks.convert_es_timestamp_to_datetime(raw)¶

Convert the ElasticSearch timestamp to a Python Datetime.

Args:: raw(string): The ElasticSearch timestamp, as a string.
Returns:: A datetime object (or None)

chalicelib.checks.higlass_checks.create_higlass_items_for_files(connection, check_name, action_name, called_by)¶

This action uses the results from check_files_for_higlass_viewconf to create or update new Higlass Items for the given Files.

Args:: connection: The connection to Fourfront. check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. called_by(string, optional, default=None): uuid of the check this action is associated with.

If None, use the primary result.
Returns:: An action object.

chalicelib.checks.higlass_checks.create_or_update_higlass_item(connection, files, attributions, higlass_item, ff_requests_auth)¶

Create a new Higlass viewconfig and update the containing Higlass Item.

Args:

connection : The connection to Fourfront. files(dict) : Info on the files used to create the viewconfig and Item. Also sets Item status.

reference(list) : A list of Reference files accessions content(list) : A list of file dicts.

attributions(dict) : Higlass Item permission settings using uuids.: lab(string) contributing_labs(list) : A list of contributing lab uuids. award(string)
higlass_item (dict) : Determine whether to create or update the Item and how to present it.: uuid(string or None) : Update the Higlass Item with this uuid (or create a new one if None) title(string) description(string)
ff_requests_auth(dict) : Needed information to connect to Fourfront.: ff_auth(dict) : Authorization needed to post to Fourfront. headers(dict) : Header information needed to post to Fourfront.

Returns:

A dictionary:: item_uuid(string): The uuid of the new Higlass Item. or None if there was an error. error(string): None if the call was successful.

chalicelib.checks.higlass_checks.files_not_registered_with_higlass(connection, **kwargs)¶

Used to check registration of files on higlass and also register them through the patch_file_higlass_uid action.

If confirm_on_higlass is True, check each file by making a request to the higlass server. Otherwise, just look to see if a higlass_uid is present in the metadata.

The filetype arg allows you to specify which filetypes to operate on. Must be one of: ‘all’, ‘bigbed’, ‘mcool’, ‘bg’, ‘bw’, ‘beddb’, ‘bed.multires.mv5’,’chromsizes’. ‘chromsizes’,’beddb’ and ‘bed.multires.mv5’are from the raw files bucket; all other filetypes are from the processed files bucket.

higlass_server may be passed in if you want to use a server other than higlass.4dnucleome.org.

Set time_limit kwarg to 0 or None to disable time limit.

Since ‘chromsizes’ file defines the coordSystem (assembly) used to register other files in higlass, these go first.

Args:: connection: The connection to Fourfront. **kwargs
Returns:: A check/action object.

chalicelib.checks.higlass_checks.find_cypress_test_items_to_purge(connection, **kwargs)¶

Looks for all items that are deleted and marked for purging by cypress test. Args:

connection: The connection to Fourfront. **kwargs

Returns:: A check/action object

chalicelib.checks.higlass_checks.find_expsets_otherprocessedfiles_requiring_higlass_items(connection, check_name, action_name, search_queries, find_opfs_missing_higlass=True, minutes_leeway=1)¶

Check to generate Higlass view configs on Fourfront for Experiment Sets Other Processed Files (aka Supplementary Files.)

Args:

check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. search_queries(list, optional, default=[]): A list of search queries. All Expsets found in at least one of the queries will be modified. find_opfs_missing_higlass(boolean, optional, default=True): If True, search_queries is ignored and the check will find Other Processed File groups

with missing Higlass Items.

minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.

Returns:

check results object.

chalicelib.checks.higlass_checks.find_expsets_processedfiles_requiring_higlass_items(connection, check_name, action_name, search_queries, minutes_leeway=1)¶

Discover which ExpSets need Higlass Item updates base on their Processed Files or Processed Files in Experiment Sets.

Args:: connection: The connection to Fourfront. check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. search_queries(list, optional, default=[]): A list of search queries. All ExpSets found in at least one of the queries will be modified. minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.
Returns:: check result object.

chalicelib.checks.higlass_checks.find_files_requiring_higlass_items(connection, check_name, action_name, search_queries, minutes_leeway=1)¶

Check to generate Higlass Items for appropriate files.

Args:: check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. search_queries(list, optional, default=[]): A list of search queries. All Files found in at least one of the queries will be modified. minutes_leeway(integer, optional, default=1): Number of minutes after the action completed to compare against.
Returns:: check results object.

chalicelib.checks.higlass_checks.gather_processedfiles_for_expset(expset)¶

Collects all of the files for processed files.

Args:: expset(dict): Contains the embedded Experiment Set data.

Returns: A dictionary with the following keys:

genome_assembly(string, optional, default=””): The genome assembly all

of the files use. Blank if there is an error or no files are found.

files(list) : A list of identifiers for the

discovered files.

auto_generated_higlass_view_config(string, optional, default=None): Returns the uuid of the Higlass Item generated by a previous check. manual_higlass_view_config(string, optional, default=None): Returns the uuid of the Higlass Item that wasn’t automatically generated. error(string, optional, default=””) : Describes any errors generated.

chalicelib.checks.higlass_checks.get_reference_files(connection)¶

Find all of the tagged reference files needed to create Higlass view configs.

Args:

connection: The connection to Fourfront.

Returns:

Returns a dictionary of reference files.: Each key is the genome assembly (examples: GRCm38, GRCh38) Each value is a list of uuids.

chalicelib.checks.higlass_checks.get_viewconf_status(files)¶

Determine the Higlass viewconf’s status based on the files used to compose it.

Args:: files(list) : A list of file objects that contain a status.
Returns:: A string.

chalicelib.checks.higlass_checks.interpolate_query_check_timestamps(connection, search_query, action_name, result_check, minutes_leeway=1)¶

Search for Foursight check timestamps in the search query and replace them with the actual timestamp.

Args:: connection : The connection to Fourfront. search_query(string) : This query may have a substitute key phrase. action_name(string) : Name of the related action. result_check(RunResult) : This object can look for the history of other checks. minutes_leeway(integer, optional, default=1): Number of minutes to move the timestamp into the future.
Returns:: The new search_query.

chalicelib.checks.higlass_checks.patch_expsets_otherprocessedfiles_for_new_higlass_items(connection, **kwargs)¶

Create Higlass Items for Files indicated in check_higlass_items_for_new_files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: An action object.

chalicelib.checks.higlass_checks.patch_expsets_otherprocessedfiles_for_queried_files(connection, **kwargs)¶

Update the Higlass Items from Experiment Set Other Processed Files (aka Supplementary Files).

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: action object.

chalicelib.checks.higlass_checks.patch_expsets_processedfiles_for_modified_higlass_items(connection, **kwargs)¶

Update the Experiment Sets Higlass Items for its Processed Files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: action object.

chalicelib.checks.higlass_checks.patch_expsets_processedfiles_for_new_higlass_items(connection, **kwargs)¶

Update the Experiment Sets Higlass Items for its Processed Files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: action object.

chalicelib.checks.higlass_checks.patch_expsets_processedfiles_for_queried_higlass_items(connection, **kwargs)¶

Update the Experiment Sets Higlass Items for its Processed Files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: action object.

chalicelib.checks.higlass_checks.patch_file_higlass_uid(connection, **kwargs)¶

After running “files_not_registered_with_higlass”, Try to register files with higlass.

Set time_limit kwarg to 0 or None to disable time limit.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

file_accession(string, optional, default=None): Only check this file. force_new_higlass_uid (boolean, optional, default=False): If True, create a new higlass_uid for this file.
Returns:: A check/action object.

chalicelib.checks.higlass_checks.patch_higlass_items_for_modified_files(connection, **kwargs)¶

Create Higlass Items for Files indicated in check_higlass_items_for_modified_files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: An action object.

chalicelib.checks.higlass_checks.patch_higlass_items_for_new_files(connection, **kwargs)¶

Create Higlass Items for Files indicated in check_higlass_items_for_new_files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: An action object.

chalicelib.checks.higlass_checks.patch_higlass_items_for_queried_files(connection, **kwargs)¶

Create Higlass Items for Files indicated in check_higlass_items_for_queried_files.

Args:: connection: The connection to Fourfront. **kwargs, which may include:

called_by(optional, string, default=None): uuid of the associated check. If None, use the primary check
Returns:: An action object.

chalicelib.checks.higlass_checks.post_viewconf_to_visualization_endpoint(connection, reference_files, files, lab_uuid, contributing_labs, award_uuid, title, description, ff_auth, headers)¶

Given the list of files, contact fourfront and generate a higlass view config. Then post the view config. Returns the viewconf uuid upon success, or None otherwise.

Args:

connection : The connection to Fourfront. reference_files(dict) : Reference files, stored by genome assembly (see get_reference_files) files(list) : A list of file objects. lab_uuid(string) : Lab uuid to assigned to the Higlass viewconf. contributing_labs(list) : A list of uuids referring to the contributing labs to assign to the Higlass viewconf. award_uuid(string) : Award uuid to assigned to the Higlass viewconf. title(string) : Higlass view config title. description(string) : Higlass view config description. ff_auth(dict) : Authorization needed to post to Fourfront. headers(dict) : Header information needed to post to Fourfront.

Returns:

A dictionary:: view_config_uuid: string referring to the new Higlass view conf uuid if it succeeded. None otherwise. error: string describing the error (blank if there is no error.)

chalicelib.checks.higlass_checks.purge_cypress_items(connection, **kwargs)¶

Using the find_cypress_test_items_to_purge check, deletes the indicated items. Args:

connection: The connection to Fourfront. **kwargs

Returns:: A check object

chalicelib.checks.higlass_checks.update_expsets_otherprocessedfiles_for_higlass_items(connection, check_name, action_name, called_by)¶

Create, Post and Patch HiGlass Items for the given Experiment Sets and their Other Processed Files (aka Supplementary files) entries

Args:: connection: The connection to Fourfront. check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. called_by(string, optional, default=None): uuid of the check this action is associated with.

If None, use the primary result.
Returns:: An action object.

chalicelib.checks.higlass_checks.update_expsets_processedfiles_requiring_higlass_items(connection, check_name, action_name, called_by)¶

Create or update Higlass Items for the Experiment Set’s Processed Files.

Args:: connection: The connection to Fourfront. check_name(string): Name of Foursight check. action_name(string): Name of related Foursight action. called_by(string, optional, default=None): uuid of the check this action is associated with.

If None, use the primary result.
Returns:: An action object.

chalicelib.checks.release_updates_checks.add_to_report(exp_set, exp_sets)¶: Used to process search hits in experiment_set_reporting_data

chalicelib.checks.release_updates_checks.data_release_updates(connection, **kwargs)¶: TODO: New version of this check - for now, does nothing - see old version above.

chalicelib.checks.release_updates_checks.experiment_set_reporting_data(connection, **kwargs)¶: Get a snapshot of all experiment sets, their experiments, and files of all of the above. Include uuid, accession, status, and md5sum (for files).

chalicelib.checks.release_updates_checks.find_item_title(item_id)¶: Data is always used for release updates, so hardcode here (dirty hack) Use cache to improve performance Return display title of given item, None if item can’t be found

chalicelib.checks.release_updates_checks.find_replacing_item(item_id)¶: Data is always used for release updates, so hardcode here (dirty hack) Use cache to improve performance Return the @id of replacing item, None if item can’t be found

chalicelib.checks.release_updates_checks.generate_exp_set_report(curr_res, prev_res, **kwargs)¶

curr_res and prev_res are dictionary objects to compare. kwargs should include: - add_ons (dict important info passed to calculate_report_from_change) - field_path (list representing the path of objects we have traversed) - report_fields (list of which fields are significant to report on) - released_statuses (list of which statuses to consider “released”)

Maybe this should be deprecated in favor of using existing equivalents in add_ons…

children_fields (list of which fields could be child objects)

chalicelib.checks.release_updates_checks.publish_data_release_updates(connection, **kwargs)¶: TODO: This action probably needs rewriting as well as it based on the OLD data_release_updates check.

chalicelib.checks.release_updates_checks.sync_google_analytics_data(connection, **kwargs)¶

This checks the last time that analytics data was fetched (if any) and then triggers an action to fill up fourfront with incremented google_analytics TrackingItems.

TODO: No use case yet, but we could accept start_date and end_date here & maybe in action eventually.

chalicelib.checks.system_checks.check_long_running_ec2s(connection, **kwargs)¶: Flag all ec2s that have been running for longer than 1 week (WARN) or 2 weeks (FAIL) if any contain any strings from flag_names in their names, or if they have no name.

chalicelib.checks.system_checks.clean_up_travis_queues(connection, **kwargs)¶: Clean up old sqs queues based on the name (“travis-job”) and the creation date. Only run on data for now

chalicelib.checks.system_checks.elastic_search_space(connection, **kwargs)¶: Checks that our ES nodes all have a certain amount of space remaining

chalicelib.checks.system_checks.process_download_tracking_items(connection, **kwargs)¶: Do a few things here, and be mindful of the 5min lambda limit. - Consolidate tracking items with download_tracking.range_query=True - Change remote_ip to geo_country and geo_city - If the user_agent looks to be a bot, set status=deleted - Change unused range query items to status=deleted

chalicelib.checks.system_checks.purge_download_tracking_items(connection, **kwargs)¶: This check was originally created to take in any search through kwargs. Changed to hardcode a search for tracking items, but it can easily adapted; as it is, already handles recording for any number of item types. Ensure search includes limit, field=uuid, and status=deleted

chalicelib.checks.system_checks.say_my_name(connection, **kwargs)¶: List the person working on each environment.

chalicelib.checks.system_checks.wipe_ff_build_indices(connection, **kwargs)¶: Wipes build (number prefixed) indices (on fourfront-testing)

chalicelib.checks.wfr_checks.atac_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.atac_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.bam_re_start(connection, **kwargs)¶: Start bam_re runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.bam_re_status(connection, **kwargs)¶: Searches for fastq files that don’t have bam_re

chalicelib.checks.wfr_checks.bamqc_start(connection, **kwargs)¶: Start bamqc runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.bamqc_status(connection, **kwargs)¶: Searches for annotated bam files that do not have a qc object Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.bed2beddb_start(connection, **kwargs)¶: Start bed2beddb runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.bed2beddb_status(connection, **kwargs)¶: Searches for small bed files uploaded by user in certain types Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.bed2multivec_start(connection, **kwargs)¶: Start bed2multivec runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.bed2multivec_status(connection, **kwargs)¶: Searches for bed files states types that don’t have bed2multivec Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.bg2bw_start(connection, **kwargs)¶: Start bg2bw runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.bg2bw_status(connection, **kwargs)¶: Searches for pairs files produced by 4dn pipelines that don’t have bg2bw Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.capture_hic_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.capture_hic_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.chia_pet_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.chia_pet_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.chip_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.chip_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.compartments_caller_start(connection, **kwargs)¶: Start compartments caller runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.compartments_caller_status(connection, **kwargs)¶: Calls compartments on mcool files produced by the Hi-C pipeline

chalicelib.checks.wfr_checks.dilution_hic_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.dilution_hic_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.dnase_hic_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.dnase_hic_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.fastq_first_line_start(connection, **kwargs)¶: Start fastq_formatqc runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.fastqc_start(connection, **kwargs)¶: Start fastqc runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.fastqc_status(connection, **kwargs)¶: Searches for fastq files that don’t have fastqc Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.in_situ_chia_pet_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.in_situ_chia_pet_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.in_situ_hic_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.in_situ_hic_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.insulation_scores_and_boundaries_start(connection, **kwargs)¶: Start insulation scores and boundaries caller runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.insulation_scores_and_boundaries_status(connection, **kwargs)¶: Calls insulation scores and boundaries on mcool files produced by the Hi-C pipeline

chalicelib.checks.wfr_checks.long_running_wfrs_fdn_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.long_running_wfrs_fdn_status(connection, **kwargs)¶

Find all runs with run status running/started. Action will cleanup their metadata, and this action might lead to new runs being started. arg:

limit_to_uuids: comma separated uuids to be returned to be deleted, to be used when a subset of runs needs cleanup

should also work if a list item is provided as input

chalicelib.checks.wfr_checks.margi_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.margi_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.md5run_start(connection, **kwargs)¶: Start md5 runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.md5run_status(connection, **kwargs)¶: Searches for files that are uploaded to s3, but not went though md5 run. This check makes certain assumptions -all files that have a status<= uploaded, went through md5run -all files status uploading/upload failed, and no s3 file are pending, and skipped by this check. if you change status manually, it might fail to show up in this checkself. Keyword arguments: file_type – limit search to a file type, i.e. FileFastq (default=File) start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.md5run_status_extra_file(connection, **kwargs)¶: Searches for extra files that are uploaded to s3, but not went though md5 run. no action is associated, we don’t have any case so far. Will be implemented if this check gets WARN

chalicelib.checks.wfr_checks.micro_c_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.micro_c_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.nad_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.nad_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.pairsqc_start(connection, **kwargs)¶: Start pairsqc runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.pairsqc_status(connection, **kwargs)¶: Searches for pairs files produced by 4dn pipelines that don’t have pairsqc Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.plac_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.plac_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.problematic_wfrs_fdn_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.problematic_wfrs_fdn_status(connection, **kwargs)¶

Find all runs with run status error. Action will cleanup their metadata, and this action might lead to new runs being started. arg:

delete_category: comma separated category list

which categories to delete with action, by default Rerun is deleted

limit_to_uuids: comma separated uuids to be returned to be deleted, to be used when a subset of runs needs cleanup

should also work if a list item is provided as input

chalicelib.checks.wfr_checks.repli_2_stage_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.repli_2_stage_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.repli_multi_stage_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.repli_multi_stage_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.rna_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.rna_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.rna_strandedness_start(connection, **kwargs)¶: Start rna_strandness runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.rna_strandedness_status(connection, **kwargs)¶: Searches for fastq files of experiment seq type that don’t have beta_actin_count fields Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead (default=24 hours)

chalicelib.checks.wfr_checks.tcc_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.tcc_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.template_start(connection, **kwargs)¶: Start template runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.template_status(connection, **kwargs)¶: Searches for fastq files that don’t have template

chalicelib.checks.wfr_checks.trac_loop_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.trac_loop_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wfr_checks.tsa_seq_start(connection, **kwargs)¶: Start runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wfr_checks.tsa_seq_status(connection, **kwargs)¶: Keyword arguments: lab_title – limit search with a lab i.e. Bing+Ren, UCSD start_date – limit search to files generated since a date formatted YYYY-MM-DD run_time – assume runs beyond run_time are dead

chalicelib.checks.wrangler_checks.add_contributing_lab_opf(connection, **kwargs)¶: Add contributing lab (the experimental lab that owns the experiment/set) to the other processed files (supplementary) analyzed by a different lab.

chalicelib.checks.wrangler_checks.add_suggested_enum_values(connection, **kwargs)¶: No action is added yet, this is a placeholder for automated pr that adds the new values.

chalicelib.checks.wrangler_checks.biorxiv_is_now_published(connection, **kwargs)¶

To restrict the check to just certain biorxivs use a comma separated list of biorxiv uuids in uuid_list kwarg. This is useful if you want to only perform the replacement on a subset of the potential matches - i.e. re-run the check with a uuid list and then perform the actions on the result of the restricted check.

Known cases of incorrect associations are stored in the check result in the ‘false_positive’ field of full_output. To add new entries to this field use the ‘false_positive’ kwarg with format “rxiv_uuid1: number_part_only_of_PMID, rxiv_uuid2: ID …”

eg. fd3827e5-bc4c-4c03-bf22-919ee8f4351f:31010829 and to reset to empty use ‘RESET’

There are some examples of the title and author list being different enough so that the pubmid esearch query doesn’t find the journal article. In order to allow the replacement, movement of all the relevant fields and adding replacement static sections in the action - a parameter is provided to manually input a mapping between biorxiv (uuid) to journal article (PMID:ID) - to add that pairing to the result full_output. It will be acted on by the associated action format of input is uuid PMID:nnnnnn, uuid PMID:nnnnnn

NOTE: because the data to transfer from biorxiv to pub is obtained from the check result it is important to run the check (again) before executing the action in case something has changed since the check was run

chalicelib.checks.wrangler_checks.check_external_references_uri(connection, **kwargs)¶: Check if external_references.uri is missing while external_references.ref is present.

chalicelib.checks.wrangler_checks.check_for_ontology_updates(connection, **kwargs)¶

Checks for updates in one of the three main ontologies that the 4DN data portal uses: EFO, UBERON, and OBI. EFO: checks github repo for new releases and compares release tag. Release tag is a semantic version number starting with ‘v’. OBI: checks github repo for new releases and compares release tag. Release tag is a ‘v’ plus the release date. UBERON: github site doesn’t have official ‘releases’ (and website isn’t properly updated), so checks for commits that have a commit message containing ‘new release’

If version numbers to compare against aren’t specified in the UI, it will use the ones from the previous primary check result.

chalicelib.checks.wrangler_checks.check_hic_summary_tables(connection, **kwargs)¶: Check for recently modified Hi-C Experiment Sets that are released. If any result is found, update the summary tables.

chalicelib.checks.wrangler_checks.check_opf_lab_different_than_experiment(connection, **kwargs)¶: Check if other processed files have lab (generating lab) that is different than the lab of that generated the experiment. In this case, the experimental lab needs to be added to the opf (contributing lab).

chalicelib.checks.wrangler_checks.check_suggested_enum_values(connection, **kwargs)¶

On our schemas we have have a list of suggested fields for suggested_enum tagged fields. A value that is not listed in this list can be accepted, and with this check we will find all values for each suggested enum field that is not in this list. There are 2 functions below:

find_suggested_enum

This functions takes properties for a item type (taken from /profiles/) and goes field by field, looks for suggested enum lists, and is also recursive for taking care of sub-embedded objects (tagged as type=object). Additionally, it also takes ignored enum lists (enums which are not suggested, but are ignored in the subsequent search).

after running this function, we construct a search url for each field,

where we exclude all values listed under suggested_enum (and ignored_enum) from the search: i.e. if it was FileProcessed field ‘my_field’ with options [val1, val2], url would be: /search/?type=FileProcessed&my_field!=val1&my_field!=val2&my_field!=No value

extract value

Once we have the search result for a field, we disect it (again for subembbeded items or lists) to extract the field value, and = count occurences of each new value. (i.e. val3:10, val4:15)

*deleted items are not considered by this check

chalicelib.checks.wrangler_checks.get_biorxiv_meta(biorxiv_id, connection)¶: Attempts to get metadata for provided biorxiv id returns the error string if fails

chalicelib.checks.wrangler_checks.grouped_with_file_relation_consistency(connection, **kwargs)¶: Check if “grouped with” file relationships are reciprocal and complete. While other types of file relationships are automatically updated on the related file, “grouped with” ones need to be explicitly (manually) patched on the related file. This check ensures that there are no related files that lack the reciprocal relationship, or that lack some of the group relationships (for groups larger than 2 files).

chalicelib.checks.wrangler_checks.new_or_updated_items(connection, **kwargs)¶

Currently restricted to experiment sets and experiments search query can be modified if desired

keeps a running total of number of new/changed items from when the last time the ‘reset’ action was run

chalicelib.checks.wrangler_checks.patch_hic_summary_tables(connection, **kwargs)¶: Update the Hi-C summary tables

chalicelib.checks.wrangler_checks.patch_strandedness_consistency_info(connection, **kwargs)¶: Start rna_strandness runs by sending compiled input_json to run_workflow endpoint

chalicelib.checks.wrangler_checks.users_with_doppelganger(connection, **kwargs)¶

Find users that share emails or have very similar names Args:

emails: comma seperated emails to run the check on, i.e. when you want to ignore some of the results ignore_current: if there are accepted catches, put them to emails, and set ignore_current to true,

they will not show up next time.

if there are caught cases, which are not problematic, you can add them to ignore list reset_ignore: you can reset the ignore list, and restart it, useful if you added something by mistake

Result:: full_output : contains two lists, one for problematic cases, and the other one for results to skip (ignore list)

chalicelib.checks.wrangler_checks.users_with_pending_lab(connection, **kwargs)¶: Define comma seperated emails in scope if you want to work on a subset of all the results

chalicelib.checks.wrangler_checks.validate_entrez_geneids(connection, **kwargs)¶: query ncbi to see if geneids are valid

chalicelib.checks.wrangler_checks.workflow_run_has_deleted_input_file(connection, **kwargs)¶: Checks all wfrs that are not deleted, and have deleted input files There is an option to compare to the last, and only report new cases (cmp_to_last) The full output has 2 keys, because we report provenance wfrs but not run action on them problematic_provenance: stores uuid of deleted file, and the wfr that is not deleted problematic_wfr: stores deleted file, wfr to be deleted, and its downstream items (qcs and output files)