» Session Details

Theme 1: Data Theory
Theme 2: Earth Systems Science Data
Theme 3: Health Care Data Systems

DS1: Astroinformatics: Learning from Data in the Astronomical Sciences 1

Description: Astronomy is on an exponentially rising curve of data growth. With the remarkably rich scientific output from the Hubble Space Telescope (HST) data archive and the Sloan Digital Sky Survey (SDSS) data archive, producing many thousands of research papers, the future promise of even more discoveries is enormous from the new sky survey data sets in the coming decade, which will be thousands of times larger. Part of that growth is in the time domain -- the sky surveys will not consist of single snapshots across the sky, but will provide repeated time series imaging of the sky. For example, the Large Synoptic Survey Telescope (LSST) survey promises to deliver every 3 nights (for 10 years) the equivalent amount of data as the entire SDSS. Consequently, the potential for discovering new and exotic objects, new classes of objects, new astrophysical processes, and new astronomical phenomena from current and future astronomical data collections is staggering. However, this potential will not be realized unless novel, scalable, and insightful data science (machine learning, data mining, statistics, knowledge discovery, visualization) methods and algorithms are developed, tested, and made ready-to-go to support these massive data-producing projects. This session will focus on data challenges, current projects, future projects, success stories, new algorithms, and new applications of existing algorithms from the new fields of Astroinformatics and Astrostatistics.

Convener:

Kirk Borne, Ph.D.
Professor of Astrophysics and Computational Science, George Mason University

DS2: Astroinformatics: Learning from Data in the Astronomical Sciences 2

Description: Astronomy is on an exponentially rising curve of data growth. With the remarkably rich scientific output from the Hubble Space Telescope (HST) data archive and the Sloan Digital Sky Survey (SDSS) data archive, producing many thousands of research papers, the future promise of even more discoveries is enormous from the new sky survey data sets in the coming decade, which will be thousands of times larger. Part of that growth is in the time domain -- the sky surveys will not consist of single snapshots across the sky, but will provide repeated time series imaging of the sky. For example, the Large Synoptic Survey Telescope (LSST) survey promises to deliver every 3 nights (for 10 years) the equivalent amount of data as the entire SDSS. Consequently, the potential for discovering new and exotic objects, new classes of objects, new astrophysical processes, and new astronomical phenomena from current and future astronomical data collections is staggering. However, this potential will not be realized unless novel, scalable, and insightful data science (machine learning, data mining, statistics, knowledge discovery, visualization) methods and algorithms are developed, tested, and made ready-to-go to support these massive data-producing projects. This session will focus on data challenges, current projects, future projects, success stories, new algorithms, and new applications of existing algorithms from the new fields of Astroinformatics and Astrostatistics.

Convener:

Kirk Borne, Ph.D.
Professor of Astrophysics and Computational Science, George Mason University
http://classweb.gmu.edu/kborne/

DS3: Visualization of Big Data

Description: Information to come.

Convener:

Juergen Symanzik, Ph.D.
Associate Professor of Mathematics and Statistics, Utah State University
Co-Editor of Computational Statistics
www.springer.com/180
www.math.usu.edu/~symanzik/

DS4: JCGS Highlights at the Interface

Description: Information to come.

Convener:

Richard A. Levine, Ph.D.
Professor and Chair Department of Mathematics and Statistics, San Diego State University
Editor, JCGS
https://edoras.sdsu.edu/~ralevine/

DS5: Educating Data Scientists

Description: The term data science has exploded in popularity in the last couple of years. Talks in this session will explore the difference between a data scientist and an applied statistician, and how best we might go about educating them.

Convener:

Hadley Wickham, Ph.D.
Assistant Professor of Statistics, Dobelman Family Junior Chair Statistics, Rice University
www.had.co.nz

DS6: Random Solutions to Big Problems

Description: Randomized algorithms present an intriguing approach to dealing with “Big Data”. This session will present the application of random projections and more generally randomized linear algebra, that extend the scope of classical multivariate methods like PCA and Hotelling's T-test to modern high-dimensional data.

Conveners:

Eric C. Chi, Ph.D.
Postdoctoral Fellow, Department of Human Genetics, UCLA
www.ericchi.com

Miles Lopes
Ph.D. candidate, Department of Statistics, UC Berkeley
ES1: Data Science and Climate 1

Description: Climate change is one of the most pressing environmental issues of our time. Whether the changes are natural or anthropogenic, they promise to alter the landscape, and have a profound impact on human welfare. At the same time, modern technologies have resulted in an explosion in the volume and types of information available to better understand the climate system. These data include both physical measurements and the output of climate models. Exploiting these information sources requires a holistic view and development of new data analysis methods. It also requires an infrastructure for which the design is both driven by and supports these new strategies. This session focuses on current statistical and machine learning methods for large or massive climate data sets, and on the requirements these methods impose on large-scale systems and architectures.

Converner:

Amy Braverman, Ph.D.
Jet Propulsion Laboratory, California Institute of Technology
https://dus.jpl.nasa.gov/home/braverman/

ES2: Data Science and Climate 2

Description: Climate change is one of the most pressing environmental issues of our time. Whether the changes are natural or anthropogenic, they promise to alter the landscape, and have a profound impact on human welfare. At the same time, modern technologies have resulted in an explosion in the volume and types of information available to better understand the climate system. These data include both physical measurements and the output of climate models. Exploiting these information sources requires a holistic view and development of new data analysis methods. It also requires an infrastructure for which the design is both driven by and supports these new strategies. This session focuses on current statistical and machine learning methods for large or massive climate data sets, and on the requirements these methods impose on large-scale systems and architectures.

Converner:
Amy Braverman, Ph.D.
Jet Propulsion Laboratory, California Institute of Technology
https://dus.jpl.nasa.gov/home/braverman/

ES3: From Large Earth Science Datasets to Compelling Scientific Results

Description: The rapid growth of earth observation from satellite, particularly during the last couple of decades or so, has resulted in unprecedented amounts of data being generated on a daily basis, ranging from radiance measurements to various levels of derivative products that describe a variety of oceanic, terrestrial, and atmospheric features and phenomena. In addition, large sets of data are acquired daily from various networks of ground-based and ocean-based systems and instruments as well as from field experiments and campaigns conducted occasional in different parts of the world. These data provide extensive information on the state of the Earth system, but also offer unique opportunities for advancing global and regional model development and data assimilation. The preponderance of these data sets can often feel overwhelming, not only because of the huge storage and processing resources they demand, but also because of the incredible amount of effort it takes to analyze and tease out coherent scientific results from them. This session encourages topics in the areas of data analysis and/or synergy with modeling that demonstrate success stories of intelligent utilization of the interface between computing science and statistics to derive compelling scientific results in Earth Science from extremely large datasets acquired by multiple satellite sensors and other measurement systems.

Conveners:

Charles Ichoku, Ph.D.
NASA Goddard Space Flight Center, Greenbelt, MD 20771, U.S.A.
www.science.gsfc.nasa.gov/sed/index.cfm?fuseAction=people.jumpBio&iphonebookid=21456

Mian Chin, Ph.D.
NASA Goddard Space Flight Center, Greenbelt, MD 20771, U.S.A.
http://acdb-ext.gsfc.nasa.gov/People/Chin/

ES4: Analysis of Big Data for Atmospheric Aerosol Research

Description: Aerosol forcing is one of the major sources of uncertainty in modeling climate forcing over the industrial period. Assessing the amount of fine particulates in the atmosphere is also important for evaluating the risks to human health. Data from current generation of NASA’s Earth observing satellite instruments, including the Moderate Resolution Imaging Spectroradiometer (MODIS Terra and Aqua), the Multi-angle Imaging SpectroRadiometer (MISR Terra), and the Ozone Monitoring Instrument (OMI Aura), have been widely used to constrain aerosol forcing uncertainty and to assess regional air quality. However, the statistical analysis of aerosol data typically requires special techniques to cope with the size of the data sets or to treat other exogenous variables such as meteorological conditions and cloud coverage that affect satellite aerosol retrievals. Missing and noisy observations also pose considerable difficulties for scientists researching the global distribution of aerosols, aerosol transportation, and comparisons between satellite observations, ground-based data, and global climate model outputs. This session focuses on:
1. Tools and new approaches to the statistical analysis of large aerosol datasets including satellite data extraction and inter-comparison
2. Statistical models that predict missing values, denoise the observed values, and quantify the spatial-prediction uncertainties in satellite datasets
3. Statistical models that relate satellite aerosol data, ground based observations, and model simulations to assess aerosol impact on air quality and climate forcing
Convener:

Olga Kalashnikova, Ph.D.
Research Scientist, Jet Propulsion Laboratory, California Institute of Technology
https://science.jpl.nasa.gov/ppeople/Kalashnikova/

ES5: Massive Data Challenges in Numerical Weather Modeling

Description: This session will focus on the challenges in numerical weather and climate modeling related with handling massive amount of data and data management systems for alleviating problems in handling massive data.

Convener:

Robert Walko, Ph.D.
Senior Scientist, division of Meteorology and Physical Oceanography, University of Miami
https://www.rsmas.miami.edu/newsroom/hurricane-experts/

ES6: Climate Data Analysis: From Satellites to Climate Models

Description: To better understand the climate system, including natural variability, human-induced perturbations, and future changes, climate scientists use a range of high volume observational datasets and climate model output. Oftentimes, these two sources of information are used in conjunction. Observations describe the climate system and reveal how climate is changing, and models help us to understand why. This requires not only large computational resources, but well-designed, meticulous analysis techniques. This session focuses on how useful scientific results can be extracted from large climate data sets, provides examples of such important results, and looks to the future of large data set analysis.

Convener:

Robert Allen, Ph.D.
Assistant Professor, Dept Earth Sciences, UC Riverside
http://faculty.ucr.edu/%7erjallen

ES7: Statistical Learning in Earth Systems Science

Description: Information to come.

Conveners:

David A van Dyk, Ph.D
Statistics Section, Dept of Mathematics, Imperial College London
http://www2.imperial.ac.uk/%7edvandyk/

Richard A. Levine, Ph.D.
Professor and Chair Department of Mathematics and Statistics, San Diego State University
Editor, JCGS
https://edoras.sdsu.edu/~ralevine/
HS1: Healthcare Process Reform

Description: Why not begin a Process Analysis of Healthcare by focusing on high free-market cost for research, pharmaceuticals and technology with health and malpractice insurance as well as highly insurance-regulated access, cost and quality for service, physician and hospital Care of the Public? Why not resolve conflicting Stakeholder Goals, direct Funding to incentivize Goals, use Engineering to convert research, pharmaceuticals, technology and insurance into better Delivery of Care to the Public, analyze Big Data to support them, and institute Accountability for Goals? Why does Healthcare Leadership not exercise its inherent responsibility to manage the Healthcare Process toward including the Public as a Stakeholder, and striving for more cost-effective access and quality of Care Delivery to the Public? Are Translational Medicine and its Genomics component not worthy of similar Process Analysis if appropriately interpreted?

Convener:

Arnold Goodman, Ph.D.
Collaborative Data Solutions

HS2: Big Data Challenges in Bioinformatics and Medical Informatics

Description: This session deals with the curation and analysis issues of big data sets generated from next generation sequencing (Illumina, 454 and others) experiments. It will also cover the latest strategies to mine these data sets for biological\medical discovery.

Conveners:

Nadim Alkharouf, Ph.D.
Associate Professor, Computer and Information Sciences, Towson University
http://www.towson.edu/mb3/faculty-members.asp#nalkharouf

Ian Misner, Ph.D.
Postdoctoral Fellow, Towson University
http://cels.uri.edu/bio/lanelab/people.html

HS3: Application of Big Data and Analytics to the Healthcare Setting (Panel Discussion)

Description: Vast amounts of data related to every aspect of healthcare are collected daily. Giving meaning to these data can provide us with new insights into all aspects of healthcare from research, to healthcare systems, to individual patients. Harnessing big data to understand health and disease, to manage and interpret clinical trials, to improve efficiencies in patient care, to drive innovations are some examples of the power of big data in healthcare. This session, “Application of Big Data and Analytics to the Healthcare Setting,” will explore the current and future uses of big data in healthcare and the challenges inherent in application of big data to healthcare.

Convener:

Janeen Hill, Ph.D.
Professor of Biological Sciences, Schmid College of Science and Technology, Chapman University
http://www.chapman.edu/our-faculty/janeen-hill

Registration Fees

General Admission
Full Conference
$300 (early bird) $375 (at door)

General Admission
One day only
$175 (early bird and at door)

Student with ID - $100

Chapman Faculty and Staff - $50

Register Now

» Session Details

DS1: Astroinformatics: Learning from Data in the Astronomical Sciences 1

DS2: Astroinformatics: Learning from Data in the Astronomical Sciences 2

DS3: Visualization of Big Data

DS4: JCGS Highlights at the Interface

DS5: Educating Data Scientists

DS6: Random Solutions to Big Problems

ES1: Data Science and Climate 1

ES2: Data Science and Climate 2

ES3: From Large Earth Science Datasets to Compelling Scientific Results

ES4: Analysis of Big Data for Atmospheric Aerosol Research

ES5: Massive Data Challenges in Numerical Weather Modeling

ES6: Climate Data Analysis: From Satellites to Climate Models

ES7: Statistical Learning in Earth Systems Science

HS1: Healthcare Process Reform

HS2: Big Data Challenges in Bioinformatics and Medical Informatics

HS3: Application of Big Data and Analytics to the Healthcare Setting (Panel Discussion)

Registration Fees

Venue Information