JPL and USC, under the direction of Dr. Chris Mattmann, have worked to collect a corpus of “deep web” polar datasets spanning many file types containing scientific data such as images, videos, and other information on the Web. These pieces of data were collected using Apache Nutch, Apache Tika, and Apache Solr.
Our goal is to aggregate this data into an intuitive search engine that scientists can utilize for polar research. Additionally, the data is analyzed and illustrated using visualization APIs Banana and D3.js, providing researchers a better understanding of the data's relationship within the Polar ecosystem.
|4 April 2017||Arctic Science Summit Week||Prague, Czechia||Presented in session "ARCTIC DATA AND INFORMATION SCIENCE MEETS SYSTEM SCIENCE"
|24 July 2017||International Geoscience and Remote Sensing Symposium||Ft. Worth, USA||Presented in session "Intelligence for Big Geospatial Data"
|16-18 September 2017||SAON - Arctic Data Committee||Montreal, Canada||Details|
|19-20 September 2017||Research Data Alliance||Montreal, Canada||Details|
|4-5 October 2017||NITR Open Knowledge Network||Washington DC, USA||Details|
|11-15 December 2017||Fall AGU||Washington DC, USA||Details
|8 January 2018||Semantics Symposium||Washington DC, USA||Details
|9-11 January 2018||ESIP Winter Meeting||Washington DC, USA||Poster Presentation
|1-2 March 2018||1st U.S. Semantic Technologies Symposium (US2TS)||Wright State University, Dayton, Ohio, USA||Details|
|21-23 March 2018||Research Data Alliance||Berlin, Germany||Details|
|6-8 June 2018||Earthcube All Hands Meeting||Washington DC, USA||Details
|17-20 July 2018||ESIP Summer Meeting||Tucson, USA||Details|
Search multiple keywords simultaneously for thousands of relevant URLs.
Add filters for more refined results using Banana's live-updating visualizations.
View data sets from a variety of sources to better understand polar relationships.
This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models.
The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordingly.
The Polar Deep Insights project is a tool that can be used as generic content extraction and evaluation tool on any dataset.
It is a Dockerized Pipeline consisting of a content extraction, enrichment and rich visualization interface to explore the spatial-conceptual-temporal trec polar dataset and documents downloaded from ACADIS, AMD, and NSIDC websites crawled using Sparkler Web Crawler.We plan to use this to gain deep insights about climate change and its impact on the Arctic region.
This project uses Google Search API to provide a list of most occurred urls based on domain keywords and phrases list. The code generates the phrases first based on the provided keywords and then uses them for searching.
After each search, top 10 urls(or all active & working URLs from the first page) are considered and added to a dictionary. Iterating through all keywords, the dictionary is finally sorted based on the frequency of occurrence.
A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc. Sparkler (contraction of Spark-Crawler) is a new web crawler that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and pf4j.
Sparkler is an extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster.
LDA topic modeling for Polar Deep Insights.
Domain Discovery on Polar Domain
This is a FacetView setup for ocean observation Crawled Data.
The goal of the Text Retrieval Conference (TREC) is to encourage research in information retrieval from large text collections by providing interesting and understudied domains of documents to crawl.
Currently, the polar domains contains the NSF-funded Advanced Cooperative Artic Data and Information System (ACADIS), NASA-funded Antarctic Master Directory (AMD), and National Snow and Ice Data Center (NSIDC) Arctic Data Explorer. Our data was retrieved using these directories and submitted to TREC in 2015.
Hosted by the NSF, the goal of this hackathon was to implement visualizations of existing polar data sets to support new discoveries and promote cross agency collaboration between the NSF, NASA, NOAA and other Arctic/Polar related agencies.
Ultimately, the workshop fostered the understanding of the variability of the polar regions at different timescales, allowing the NSF to make longer-term investments in technologies and visualizations that can be adopted by the community.
The Information Retrieval and Data Science Group’s (I.R.D.S.) mission is to research and develop new methodology and open source software to analyze, ingest, process, and manage Big Data and to turn it into information.
We have expertise in data collection and contribute to the world's largest and most often downloaded open-source projects, working with NASA, DARPA, DHS, NIH across a number of domains, Earth Science, Planetary Science, Astronomy, defense, and private industry.