- First-of-its-Kind Study Finds Lightning Impacts Edge of Space in Ways not Previously Observed13 Sep, 2021
- 12 UCF Researchers Honored with Asteroids Named After Them31 Aug, 2021
- Arecibo Observatory Collaborations & Exhibitions (April - June, 2021)21 Jul, 2021
- The Arecibo Observatory: Current and Future Operations of the Facility21 Jul, 2021
- Arecibo Observatory: Unparalleled Science and Discovery21 Jul, 2021
- Info for AAS #239 AO-focused Special Session #20: The Arecibo Observatory REU Program - a Career Launchpad20 Jul, 2021
- AO Participation in the CEDAR 2021 workshop20 Jul, 2021
- Facilities and Operations Highlights (July 2021)19 Jul, 2021
- Arecibo Salvage Survey Committee Update for History of Astronomy19 Jul, 2021
- The Big Data Program: Arecibo Observatory Data Archive 19 Jul, 2021
- Sustainability Project: Rain Collector14 Jul, 2021
- A Career is Born at the Arecibo Observatory 14 Jul, 2021
- In Memoriam: Dr. Gordon Pettengill08 Jul, 2021
- Arecibo Observatory Spies “Pristine” Comet Before the Telescope’s Collapse08 Jul, 2021
- GBO/AO Single Dish Observing School (Hybrid) - September 13-21, 202106 Jul, 2021
- “Arecibo Observatory - Legacy and Future”29 Apr, 2021
Byadmin19 July 2021 #AOScienceNow
Through the Big Data Program at the Arecibo Observatory (AO), we are developing the Arecibo Archives Data Catalog to facilitate the access to AO's projects, observations, datasets, and attributes. Approximately half of the AO database is currently available in the catalog: https://www.naic.edu/datacatalog/
The purpose of the Data Catalog is to provide a user-friendly portal where users can browse, query, and explore the projects observed at Arecibo for more than 55 years. This catalog consolidates multiple data sources that have been built throughout AO's operation. The main component of the Data Catalog is the Projects Catalog, which provides all of the technical information about a proposal or project. This is essentially what the scientists would submit as a proposal to receive Arecibo observing time. The Data Catalog is complemented by the Observations Log, a Files Catalog and an Attributes Catalog. The Observations Log provides a detailed log recorded by the observing scientists for each project. The Files and Attributes catalogs contain all of the raw data files that were captured in the observations as well as key metadata of those files.
To build this catalog, the Big Data team worked to first identify and catalog all of the projects that have been done at Arecibo. This was no easy task since the data was stored in many formats throughout the years. For each format, the team created scripts that scraped or extracted all technical information from the documents and saved them into a database. This first step is the foundation of the Data Catalog.
In a similar way, the team extracted and compiled the Observations Log using log information that existed in different locations. Most of the observations were already saved in a database, making it easier to integrate into the catalog. The Files Catalog is being built as the datasets are copied to the Texas Advanced Computing Center. Once a dataset is copied, the team catalogs it and creates a record for it within the Catalog Database, keeping record of the file location, corresponding project, and size. Finally, the Attributes Catalog is being actively populated by extracting headers, metadata, and attributes from the raw files. This is being done using scripts that navigate through the server's paths and extracts the attributes from each file. This is catalogued and saved into a database that keeps record of all scientific attributes including related file name and project.
This catalog's importance is incalculable. It is the steppingstone to make Arecibo's Datasets accessible to the community and curious minds. The Data Catalog project is a computing strategy that will make the necessary data and resources widely available to the scientific community, continuing the Arecibo Observatory’s legacy of enabling groundbreaking new results about our atmosphere, our Solar System, and our universe.
Article written by Eng. Julio Alvarado Negrón
Big Data Manager
Keywords: observatory, arecibo, data, big, data, catalog, texas, TACC, advanced, computing