Daily Checklist: Data Management & Index Monitoring
Introduction to Daily Data Management Tasks
This checklist outlines the daily tasks required for data management, focusing on index monitoring, scheduled job verification, and cache clearing within the Atlas of Living Australia (ALA) data management framework. The goal is to ensure data integrity, system efficiency, and timely updates. This guide covers a specific period, from October 27th to October 31st, 2025, and is designed for data management personnel to follow. It provides step-by-step instructions, essential links to relevant resources, and critical checks to maintain data quality and system health. The checklist emphasizes the importance of promptly addressing any issues that may arise, escalating concerns through designated communication channels such as Slack. Maintaining a proactive approach is key to preventing larger problems and ensuring a smooth data flow, supporting the broader mission of the ALA. It's crucial to understand that these tasks are integral to the operational stability and data accuracy of the ALA's systems.
Daily Index Monitoring
Index switch monitoring is a critical daily task within the ALA data management workflow. Its primary objective is to verify the health and functionality of the data indexing process. This process ensures that the most current data is accessible and correctly integrated into the system.
To perform this check, you'll need to:
- Open
stdout: Initiate this by using the stepk. Check index and update collection alias. This will display the standard output, which contains the logs for the indexing process. - Scroll to the bottom: The relevant information you need is at the end of the log. Scroll down to review the most recent entries.
- Record Values: Focus on the
Current#,NEW#, andDIFF#values. These values are crucial indicators of the indexing process's status.Current#represents the current number of records in the existing index.NEW#indicates the number of records in the newly created index.DIFF#shows the difference between the two, which helps in identifying any discrepancies. A significant or unexpected difference may indicate a problem with the data indexing.
- Monitor for Severe Issues: The system's integrity check relies heavily on log analysis. The presence of any
SEVEREissues in the log files indicates potential problems. These could range from data corruption to system failures. If you findSEVEREerrors, the index switch may have failed, and you need to investigate the issue immediately.
Task Completion Status
- Tuesday: [x]
- Wednesday: [ ]
- Thursday: [ ]
- Friday: [ ]
Critical System Checks and Procedures
Caches Clearing
Clearing caches on both biocache and AVH is a vital step. Cache clearing helps to ensure that users are accessing the most up-to-date data. These systems store temporary data to speed up access, but they must be cleared to reflect the latest changes and updates. To clear the caches, navigate to the admin interfaces of both biocache and AVH and follow the appropriate procedures. Always clear the caches after the assertion sync and index switch.
Scheduled Jobs Verification
Verify that all scheduled jobs for the day have completed successfully is important for maintaining data integrity. Scheduled jobs automate various data processing tasks, from indexing to data synchronization.
- Check Job Completion: Regularly review the logs or monitoring dashboards to confirm that all scheduled jobs have run without errors. These jobs are critical for updating datasets, applying data transformations, and maintaining data consistency.
- Address Failures: If a job fails, identify the cause, which could range from data errors to system issues. Document the issue, and try to resolve it as quickly as possible. If the issue persists or is complex, report it on Slack.
Assertion Synchronization
Assertions sync is a process that involves synchronizing user assertions. These assertions are essential for data validation and quality control. Regularly confirming the successful completion of assertion synchronization is vital for keeping the system in sync.
-
User Assertion Counts: Check user assertion counts in the DQ Profile using the user assertions query.
- The total number of user assertions reflects the total number of assertions in the system.
- Unresolved user assertions refer to the number of assertions that have not been processed. Monitoring both these numbers is essential. A rise in unresolved assertions might indicate the need for additional data review or system adjustments. Ensure that these numbers are within the expected range, and address any significant discrepancies.
-
Example: The example below shows how the assertions sync should be:
- Total user assertions: 13,738
- Unresolved user assertions: 10,660
Image Processing Monitoring
Image processing involves monitoring the upload and processing of images within the ALA data system. The process involves two primary areas: image uploads and batch processing.
- Image Uploads: Check the image uploads area.
- Batch Uploads: Ensure that batch uploads are progressing without being stalled.
- Last Updated Field: Keep track of the
lastUpdatedfield in a loading batch. This field indicates the progress of the upload and helps to identify any delays or stalled processes.
EMR Cluster Monitoring
EMR (Elastic MapReduce) runs require close monitoring to ensure that no clusters run for extended periods. Long-running clusters can consume resources.
-
Monitor Running Clusters: In EMR, filter clusters by status and select 'running'.
-
Identify Long-Running Clusters: If any cluster runs for longer than 24 hours, it must be terminated to save on computing costs.
Clearing Biocache Cache
- Biocache Cache Clearing: Clearing the biocache cache is important.
- Sequence: The biocache cache must be cleared after the assertion sync is complete and the index switch has been successfully carried out.
- Admin Interface: Use the biocache admin interface, which can be found at: https://biocache.ala.org.au/admin.
Issue Reporting
- Report Issues: Any issues or problems encountered during the daily tasks should be reported immediately on the data management internal channel. This channel allows for prompt resolution and ensures the smooth operation of the system.
Weekly and Manual Tasks
Sensitive Data Check
Sensitive Data Check: Once a week, initiate a sensitive data check. This involves executing a specific script designed to identify and assess sensitive data within the system.
- Run the SDS Script: The SDS script is available at: Report-SDS-Information-Prod.
- Commit the Report: Once the script has been executed, commit the new report to GitHub: SDS Information report.
Manual Index Switch
Manual Index Switch: A manual index switch should be performed when necessary, such as when specific issues arise. This involves several key steps to ensure a smooth transition to a new index.
- Run DAG: If a manual index switch is required, run the DAG
Update-Collection-Alias. - Update Configuration: Update the configuration value
new_collection. Set this to the latest collection name. Find the new collection name in SOLR Admin or theCreate SOLR collectionstep of the Full Index DAG run.- Example: `