### Program for 2021 Systems and Information Engineering Design Symposium (SIEDS)

Time Main Zoom Room Energy & Environment Track Health Track Systems Design Track Data Track Optimization, Simulation, & Decision Analysis Infrastructure, Networks, & Policy Track

#### Thursday, April 29

01:00 pm-01:05 pm Welcome to SIEDS
01:15 pm-02:15 pm   Energy and Environment 1 Health 1 Systems Design 1 Data 1 Optimization, Simulation, and Decision Analysis 1
02:30 pm-03:30 pm Workshop: Crafting an Effective Portfolio in User Experience Design
03:45 pm-04:45 pm   Energy and Environment 2 Health 2 Systems Design 2 Data 2 Optimization, Simulation, and Decision Analysis 2 Infrastructure, Networks, and Policy 2
07:15 pm-09:15 pm         Data 3

#### Friday, April 30

08:15 am-08:20 am Welcome to Friday at SIEDS
08:30 am-09:30 am   Energy and Environment 4 Health 4 Systems Design 4 Data 4   Infrastructure, Networks, and Policy 4
09:45 am-10:45 am   Energy and Environment 5 Health 5 Systems Design 5 Data 5 Optimization, Simulation, and Decision Analysis 5 Infrastructure, Networks, and Policy 5
10:55 am-11:55 am Workshop: Machine Learning Introduction for Newbies
12:00 pm-12:30 pm Awards and Closing Ceremony

### Thursday, April 29 1:00 - 1:05 (America/New_York)

#### Welcome to SIEDS

Room: Main Zoom Room

### Thursday, April 29 1:15 - 2:15 (America/New_York)

#### Energy and Environment 1

Room: Energy & Environment Track
Chairs: Md Mofijul Islam (University of Virginia, USA), Aya Yehia (University of Virginia, USA)
1:15
Angelika M Lindberg, Rachel Logan, Hannah Marron, Bethany Brinkman, Michelle Gervasio and Bryan Kuhr (Sweet Briar College, USA)
Completed in the summer of 2020, Sweet Briar College's 26,000 square foot greenhouse is home to a variety of vegetables, providing fresh food for the campus dining hall as well as giving students the opportunity to learn about food sustainability. The current horticulture practices in the greenhouse are functional, but growth rates and processing could be improved with a hydroponic system. Hydroponics is a subset of horticulture in which plants are rooted in nutrient-rich water rather than soil. A hydroponics system would not only serve as an educational opportunity for Sweet Briar's environmental science students but would also allow for the experimentation of a variety of new plants and growing methods. Some other benefits of a hydroponic system include a significant decrease in water waste, reduced need for pesticides and herbicides, and more efficient use of space. Through extensive research and compliance with customer specifications, we determined that using a Nutrient Film Technique with an A-frame design would be the best option for the Sweet Briar College greenhouse. The NFT water flow method is one of the most respected in the field of hydroponics, as it is typically extremely reliable and user-friendly. A thin film of nutrient-laden water gently passes over the roots of the system, allowing the plant to absorb as needed. Incorporating this technique with an A-frame design would allow for the best use of space while still allowing each plant to receive optimal sunlight as compared to other vertically designed hydroponic systems. By incorporating microcontrollers and sensors to monitor the water level, pH, and electrical conductivity in the reservoir, our system will be able to dispense nutrient solution and water as needed. We plan to measure and track plant growth and survivability based on both new plant growth as well as fruit/vegetable production with the goal of exceeding that of standard soil-grown plants of the same variety and anticipate having results by April 2021. We would also like to track system water loss, either from leaks, evaporation, or absorption, by monitoring the main water reservoirs with the goal of 75% efficiency over the course of a month. Finally, we would like to measure the amount of light the plants are getting throughout the day, using either a photoresistor and microcontroller or a lux meter, to determine if additional synthetic lighting options would be beneficial to the system. We plan to have the system fully functioning and operable by May 2021.
1:30
Ruoyu Zhang, Hyunglok Kim, Emily Lien, Diyu Zheng, Lawrence Band and Venkatraman Lakshmi (University of Virginia, USA)
As the intensity and frequency of storm events are projected to increase due to climate change, local agencies urgently need a timely and reliable framework for flood forecasting, downscale from watershed to street level in urban areas. Integrated with property data with various hydrometeorological data, the flood prediction model can also provide further insight into environmental justice, which will aid households and government agencies' decision-making. This study uses deep learning (DL) methods and radar-based rainfall data to predict the inundated areas and analyze the property quickly and demographic data concerning stream proximity to provide a way to quantify socioeconomic impacts. We expect that our DL-based models will improve the accuracy of forecasting floods and provide a better picture of which communities bear the worst burdens of flooding, and encourage city officials to address the underlying causes of flood risk.
1:45
Christopher G Gacek, Derek J Gimbel, Samuel Longo, Benjamin I Mendel, Gabriel Sampaio and Thomas Polmateer (University of Virginia, USA); Mark C. Manasco (Commonwealth Center for Advanced Logistics Systems, USA); Daniel Hendrickson (Virginia Port Authority, USA); Timothy L Eddy, Jr and James H. Lambert (University of Virginia, USA)
Shipping trends in technology, regulation, energy and environment require maritime container ports to adapt their operations to better suit current and future conditions. This paper focuses on innovative solutions in three main areas of interest for ports: (1) clean energy technologies, (2) alternative financing and (3) automated process technologies. In this analysis, these areas of interest are explored using the Port of Virginia as a case study. Results are derived using scenario analysis methodology drawn from systems, risk and resilience analysis. Investment strategies in renewable energy sources are evaluated and project funding approaches, including the use of green bonds, are explored. AI systems relevant to port operations integration and container security are also described. The key results of this paper are twofold: (1) a demonstration ranking of initiatives for a port strategic plan and (2) a ranking of scenarios by their disruption on initiative impact. The results of the case study are of interest to the strategic planners at industrial ports and the maritime industry.
2:00
Jorge Barajas, Christian Detweiler, Cailyn Lager, Charles Seaver, Mark Vakarchuk, Justin Henriques and Jason Forsyth (James Madison University, USA)
This paper describes a toolkit for analyzing changes in algae levels in bodies of water as an indicator of eutrophication. Eutrophication is caused by the excessive nutrient loading in a lake or other body of water, frequently due to fertilizer runoff. The enriched water can cause dense growth of plant life (e.g. algae blooms) in the water. When this growth dies, the bacteria associated with decomposition consumes oxygen from the water, which can create a hypoxic environment (i.e. insufficient oxygen to sustain life). Not only is this an environmental problem, but also an economic problem. The estimated cost of damage mediated by eutrophication in the U.S. alone is approximately 2.2 billion annually. These costs come from a variety of factors: parks losing revenue from forced closure, clean up, and removal of algae. The key components of the system discussed in this paper are a drone, multispectral camera, and a spatial and temporal analysis software toolkit. The multispectral camera stores images on a removable SD card that are then imported into ArcGIS. Analysis is done through a custom Python toolkit created to determine vegetation health levels in bodies of water. The key focus of analysis is using the normalized difference vegetation index (NDVI) values captured from multispectral imaging to compare the different vegetation levels across various flight days. This system can help users combat eutrophication by allowing them to identify patterns and trends in the algal growth in bodies of water they manage in near real time. This may help, for example, identify patterns in fertilization and algal growth, and ultimately aid in keeping bodies of water healthy. #### Health 1 Room: Health Track Chairs: Jenn Campbell (University of Virginia, USA), Chang Xu (University of Virginia, USA) 1:15 Emily Murphy, Swathi A Samuel, Joseph Cho, William Adorno, Marcel Durieux and Donald Brown (University of Virginia, USA); Christian Ndaribitse (University of Rwanda, Rwanda) Millions of surgical operations are performed every year in African countries and the lack of digitization of data associated with them inhibit the ability to study the linkages of perioperative data with perioperative moralities [1]. Contrary to American operating rooms, where medical personnel are assisted by technologies that record and analyze patient vitals and other surgical data, low-income African operating rooms lack these resources and require their personnel to manually scribe this information onto paper flowsheets. In order to provide perioperative data to health care providers in Rwanda, the team designed and implemented image processing and machine learning techniques to automate checkbox detection for the digitization of surgical flowsheet data. A checkbox image is cropped based on its location with template matching and then processed through a trained convolutional neural network (CNN) to classify it as checked or unchecked. The template matching and CNN process were tested using 18 flowsheets. Of the 666 possible images, the template matching achieved an accuracy of 99.8%, and 96.7% of the cropped images were correctly classified using the CNN model. 1:30 Anna Bonaquist, Meredith Grehan, Owen Haines, Joseph Keogh, Tahsin Mullick and Neil Singh (University of Virginia, USA); Sam Shaaban (NuRelm, USA); Ana Radovic (University of Pittsburgh & UPMC Children's Hospital of Pittsburgh, USA); Afsaneh Doryab (University of Virginia, USA) Mobile sensing and analysis of data streams collected from personal devices such as smartphones and fitness trackers have become useful tools to help health professionals monitor and treat patients outside of clinics. Research in mobile health has largely focused on feasibility studies to detect or predict a health status. Despite the development of tools for collection and processing of mobile data streams, such approaches remain ad hoc and offline. This paper presents an automated machine learning pipeline for continuous collection, processing, and analysis of mobile health data. We test this pipeline in an application for monitoring and predicting adolescents' mental health. The paper presents system engineering considerations based on an exploratory machine learning analysis followed by the pipeline implementation. 1:45 Rachel Bigelow, Reese Bowling, Shivani Das, Zachary Dedas and Eric T Jess (University of Virginia, USA); Venkataraman Lakshmi (UVa, USA) The COVID-19 pandemic has provoked longstanding and competing interests of the economy and environment. In January 2020, countries across the globe began implementing various levels of safety measures to slow the spread of the virus. Safety measures have run the gamut of restrictions: physical distancing guidelines, proper handwashing practices, and the use of face masks are on the lower end of the restriction spectrum, while travel restrictions, business closures, and country-wide lockdowns are instances of more stringent measures. Policy responses have drastically differed among governments across the globe, but the economic strife has plagued countries regardless of their COVID-19 response plan. Lockdowns in the first half of 2020 impeded economic activity, leading to a reduction in industrial activity and hence emissions. During this time period, observations from publicly available satellite sensors have shown that concentrations of various atmospheric pollutants, nitrogen dioxide especially, have decreased. The Asia-Pacific region was no exception, with China, Japan, South Korea, Australia, and New Zealand all experiencing slowdown in growth and large reductions in various economic sectors. Using these five Asia-Pacific countries, we will analyze how government policy, lockdowns, and travel restrictions implemented during the COVID-19 outbreak have slowed economic growth in the transportation, manufacturing, and agriculture sectors, and in turn, impacted air quality and water quality. Conclusions and statistical significance of our analysis comparing coronavirus-related policies and their effect on economic growth and environmental health will help drive future decisions made by policymakers should another pandemic or similar global crisis arise. #### Systems Design 1 Room: Systems Design Track Chair: Dany Dada (University of Virginia, USA) 1:15 Derek J D'Alessandro, William Gunderson, Ethan Staten, Yann Kelsen Donastien, Pedro Rodriguez and Reid Bailey (University of Virginia, USA) As data collection and analysis grows in demand across a diverse spectrum of industries, data is collected from many sensors at different ranges with different quantities and types of data. One general approach taken by commercial firms to integrate wireless sensor data is to develop proprietary "ecosystems" of products; home automation companies like NEST, home security companies like SimpliSafe, and agricultural companies like Davis Instruments each require that customers use their hubs with their peripheral sensors. The work in this paper applies a flipped approach where a heterogeneous set of sensors from a range of suppliers connects to a hub over a variety of wireless protocols. The design of the hub, therefore, needs to easily accommodate a wide range of communication and wireless protocols. The focus of this work is on exploring how modularity can be designed into the architecture of a product to facilitate quick and low-cost customization of the hub to a particular need. This particular work focuses on designing such a hub for various low-power wide-area network (LPWAN) applications. LPWANs are technologies and protocols that have longer ranges and lower power usage than higher bandwidth protocols like Wi-Fi. LPWANs, like LoRa, specialize in applications where many sensors are distributed over larger distances and, due to the small amounts of data they intermittently send, require less power. This modular hub needs to be able to recognize the type of radio connected to it and the type of communication (I2C, SPI, UART) used by the radio. Such recognition will enable variable quantities of different radios to be connected to the hub without significant redesign of the electronics or the firmware. Furthermore, the housing for the hub needs to be sufficiently modular so that any radio could be inserted without requiring a new design. Using custom components in only certain interfaces is central to the electronics design, and such modularity depends heavily on the firmware. With respect to the housing, a key trade-off for integrating modularity is accommodating variability in radios while maintaining ergonomic design. A key consideration in both housing and electronic design is incorporating modularity only where needed, and creating components in-house when necessary. 1:30 Jenna Cotter and Andrew Atchley (University of Alabama in Huntsville, USA); Barbara Banz (Yale University School of Medicine, USA); Nathan Tenhundfeld (University of Alabama in Huntsville, USA) According to the United States Department of Justice, 28 people in the United States die every day because of drunk driving. As self-driving vehicles become more prevalent, the ability for automated cars to determine when the driver is impaired and to then take control, could save many lives. Past research has looked at the certain indicators for impaired driving, but to date there has been relatively little consideration of the potential interaction between an impaired driver and a self-driving vehicle. In this paper, we will review the existing literature in order to recap the possible approaches vehicle manufacturers could take in establishing that a driver is impaired: physiological, behavioral, and vigilance monitoring. These approaches will be contrasted with one another. Following the review, we will propose several design solutions to be developed and tested. These solutions include a 'full', 'partial', and 'supervisory' takeover by the vehicle. Our full takeover proposed design will provide no opportunity for driver input at any point. Our partial takeover proposed design will involve a full takeover, but with additional impairment tests that the driver can perform in order to demonstrate capacity to drive. Finally, our supervisory takeover proposed design will involve the system actively monitoring performance in order to more quickly engage safety procedures (e.g. lane keep and emergency braking). The relative benefits and consequences of each design will be discussed with an eye towards existing theories on human-automation interaction. Finally, we will propose a path forward for design and testing. Taken together, this paper will present a novel consideration of a new avenue for human-machine teaming. Such considerations could be instrumental in saving thousands of lives each year, and helping to prevent countless other injuries. 1:45 Jordan Machita and Taylor R Rohrich (University of Virginia, USA); Yusheng Jiang (University of Virginia & UVA, USA); Yiran Zheng (University of Virginia, USA) The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia's School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural framework that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research. 2:00 Stephen Mitchell and Jason Forsyth (James Madison University, USA); Michael S. Thompson (Bucknell University, USA) The growing market for sports analytics has spurred more interest than ever in quantifying athletic performance. This trend, alongside the proliferation of new wearable technologies, has expanded the possibilities for both professional and amateur athletes to instrument themselves and collect meaningful data. The reactive strength index (RSI) can be used to communicate this kind of data by presenting a person's ability for rapid movement. A user study was conducted in which young adults of amateur athletic status performed a jumping exercise to assess the feasibility of using a commercial off-the-shelf inertial measurement unit (IMU) to measure this metric compared to the usual method of using a force plate. Results suggest that the measurement of meaningful RSI improvements is possible using inexpensive IMUs with comparable results to costly force plates. #### Data 1 Room: Data Track Chairs: Faria Tuz Zahura (University of Virginia, USA), Cheng Wang (University of Virginia, USA) 1:15 Matthew Thomas (Inclusively, USA); Chad Sopata (University of Virginia, USA); Ben Rogers (USA); Spencer Marusco (Freddie Mac, USA) Accurate forecasts of U.S. Presidential elections are not only central to political journalism, but are used by campaigns to formulate strategy, impact financial markets, and aid businesses planning for the future. However, evidenced by the 2016 and 2020 elections, forecasting the election remains a challenging endeavor. Our review of methodologies revealed three discrete approaches: polling-based, demographic and economic fundamentals-based, and sentiment-based. We sought to identify which advantages each approach offers. We built on past research to adopt a novel forecast model that combines a weighted average of a hierarchical Bayesian fundamentals model and a Bayesian polling model. Our results indicated problems with polling-based methods because of inaccuracies in the polls, and better-than-anticipated accuracy in the fundamentals-only model. 1:30 Navya Annapareddy (University of Virginia); Emir Sahin, Sander Abraham and Md Mofijul Islam (University of Virginia, USA); Max DePiro (Perrone Robotics, USA); Tariq Iqbal (University of Virginia, USA) Computer vision techniques have been frequently applied to pedestrian and cyclist detection for the purpose of providing sensing capabilities to autonomous vehicles, and delivery robots among other use cases. Most current computer vision approaches for pedestrian and cyclist detection utilize RGB data alone. However, RGB-only systems struggle in poor lighting and weather conditions, such as at night, or during fog or precipitation, often present in pedestrian detection contexts. Thermal imaging presents a solution to these challenges as its quality is independent of time of day and lighting conditions. The use of thermal imaging input, such as those in the Long Wave Infrared (LWIR) range, is thus beneficial in computer vision models as it allows the detection of pedestrians and cyclists in variable illumination conditions that would pose challenges for RGB-only detection systems. In this paper, we present a pedestrian and cyclist detection method via thermal imaging using a deep neural network architecture. We have evaluated our proposed method by applying it to the KAIST Pedestrian Benchmark dataset, a multispectral dataset with paired RGB and thermal images of pedestrians and cyclists. The results suggest that our method achieved an F1-score of 81.34%, indicating that our proposed approach can successfully detect pedestrians and cyclists from thermal images alone. 1:45 Nolan K Alexander, David E Brenman, John Eshirow, Joshua Rosenblatt, Justin S Wolter and William Scherer (University of Virginia, USA); James Valeiras (Facebook, Inc., USA) Social Network Services (SNS) are systems that allow users to build social relations with one another, with one of the largest SNSs being Facebook, totaling 2.6 billion active monthly users in 2020. Their live-streaming service, Facebook Live, is one of the fastest-growing branches of the company, allowing creators to synchronously broadcast original content to the public. However, in the rapidly growing world of technology, Facebook Live faces fierce competition from other live-streaming platforms (Twitch, YouTube Live, etc.) and well as other video-on-demand providers (Netflix, Hulu, TikTok, etc.). To better understand current issues and future directions, our team focused on the Facebook Live platform, to develop a three to five-year strategic plan for the platform. We focus on Facebook Live's growth opportunities from a multitude of perspectives, including the competitive landscape, interface modification, future projections based on historical trends, and competitive analysis. Our approach utilizes the systems analysis process, focusing top-down on objectives and metrics. To produce a comprehensive strategy for future operations, we employ analytical methods ranging from quantitative data analysis to qualitative exploration of industry trends. These quantitative methods include statistical analysis, time-series forecasting, and natural language processing. Qualitative methods include domain research into the history, current state, and possible future for live-streaming. Forthcoming, the results for the complete analysis will be synthesized into a multi-recommendation strategic report to provide Facebook with flexible guidance for continuing operations. We also present a comments summarization and visualization feature for viewers and creators, three to five-year market forecasts after COVID-19 lockdowns, and attractive emerging markets including education and morning shows. 2:00 Stephen C Loftus and Sydney A Campbell (Sweet Briar College, USA) Rankings-an ordering of items from best to worst-are a common way to summarize a group of items, often at an individual level. These ranks are ordinal data, and should not be acted on by standard mathematical operations such as averaging. Thus, combining these individual rankings to get a consensus can present a difficult challenge. In this paper we present a novel method of combining rankings via a Bayesian hierarchical model, using rankings and their corresponding ratings-an assessment of item quality-to create a data augmentation scheme similar to established literature. Simulations show that this method provides an accurate recovery of true rankings, particularly when the ranking system exhibits clustering within the structure. Additionally this method has the added benefit of being able to describe properties of the rankings, including how preferred one item is to another and the probability that an individual will rank one item higher than another. #### Optimization, Simulation, and Decision Analysis 1 Room: Optimization, Simulation, & Decision Analysis Chairs: Debajyoti (Debo) Datta (University of Virginia, USA), Mehrdad Fazli (University of Virginia, USA) 1:15 Lily F Rohrbach (University of Oklahoma, USA); Pedro Huebner (University of Utah, USA) Bioprinting is a rapidly emerging area of study within Tissue Engineering and Regenerative Medicine (TERM) where live cells are be embedded in a solution (referred to as a bioink) and 3D printed into anatomically relevant geometries for in vivo implantation; this allows native tissues to regenerate better and faster. Most current research investigates the mechanical, rheological, and biological properties of bioinks, but few methods have been proposed to mathematically rank bioinks based on a set of application-focused, relevant criteria. In this study, we develop a general methodology to evaluate bioinks for the purpose of musculoskeletal tissue engineering, using multi-criteria decision making (MCDM) tools, including analytical hierarchy process (AHP), simple additive weighting (SAW), and technique for order of preference by similarity to ideal solution (TOPSIS). GelMA-alginate, dECM, PEG-fibrinogen, collagen, and HA were all evaluated using the criteria of cell viability, shear thinning, printability, degradation rate, storage ability, and cost. Results include a comparison matrix showing the relative importance of each criterion as well as a ranking of each selected bioink from each ranking method. Future research should focus on building upon the proposed model by considering a larger set of bioink alternatives and criteria to demonstrate the model's ability to handle large datasets, as well as updating evaluations with more current data regarding the properties of bioinks. 1:30 Nolan K Alexander, William Scherer and Matt Burkett (University of Virginia, USA) The Markowitz model is an established approach to portfolio optimization that constructs efficient frontiers allowing users to make optimal tradeoffs between risk and return. However, a limitation of this approach is that it assumes future asset returns and covariances will be identical to the asset's historical data, or that these model parameters can be accurately estimated, a notion which often does not hold in practice. Markowitz efficient frontiers are square root second-order polynomials that can be represented by three parameters, thus providing a significant dimensionality reduction of the lookback covariances and growth of the assets. Using this dimensionality reduction, we propose an extension to the Markowitz model that accounts for the nonstationary behavior of the portfolio assets' return and covariance without the necessity to forecast the complex covariance matrix and assets growths, something that has proven to be extremely difficult. Our methodology allows users to forecast the three efficient frontier coefficients using a time-series regression. By observing similar efficient frontiers, this forecasted efficient frontier can be used to select optimal assets mean-variance tradeoffs (asset weights). For exploratory testing we employ a set of assets that span a large portion of the market to demonstrate and validate this new approach. 1:45 Esra Çakır (Galatasaray University, Turkey); Mehmet Ali Taş (Turkish-German University, Turkey); Ziya Ulukan (Galatasaray University, Turkey) Vaccination procedures, which are the most effective way to deal with the COVID-19 pandemic, have started worldwide. Since hospitals and health centers are used as vaccination centers, collecting people in one place can lead to the spread of other diseases. Therefore, the use of temporary vaccination clinics is encouraged for mass vaccination. In this study, the number of temporary clinics that need to be placed in candidate locations and the regions they serve are investigated. While the weights of the candidate places are determined by a single-valued neutrosophic fuzzy multi criteria decision-making method, the temporary vaccination clinics are assigned to the candidate locations via savings heuristic. The proposed neutrosophic fuzzy MCDM integrated saving heuristic methodology is applied on an illustrative example. The results are thought to be helpful in future multi-facility layout models. 2:00 Latifa M Hasan, Christopher McCharen, Ashley Scurlock, Congxin Xu and Brian Wright (University of Virginia, USA) Pedagogical implementation research at large identifies and addresses factors affecting adoption and sustainability of evidence-based practices. Part of this work is elaborating and testing implementation strategies which prescribe educator intervention techniques and processes to adopt and integrate into educational settings. However, in practice, implementation research is slowed and constrained by the temporal and monetary costs of conducting manual evaluations. In this study, we use an automated, low-cost, and scalable natural language processing (NLP) approach we call TranscriptSim to assess intervention fidelity in a teacher coaching study (TeachSIM). TranscriptSim quantifies similarity between the intervention protocol and intervention transcripts as an approximation of coaches' fidelity to the intervention protocol. ### Thursday, April 29 2:30 - 3:30 (America/New_York) #### Workshop: Crafting an Effective Portfolio in User Experience Design Room: Main Zoom Room Chair: Moeen Mostafavi (University of Virginia, USA) 2:30 Gregory Gerling, Sara Riggs, Seongkook Heo, Panagiotis Apostolellis, Logan Clark and Courtney C Rogers (University of Virginia, USA) In careers involving user interface/user experience (UI/UX) design, an effective portfolio is key for showcasing one's skills and knowledge to potential employers and collaborators. In this workshop, participants will gain insights into the fundamentals of UI/UX design, elements of effective portfolio design, and tools available to create a portfolio. This will be a highly interactive session as participants will interact with faculty and graduate students as well as with each other. Participants will be given the opportunity to have their pre-existing portfolio reviewed by the faculty and graduate student presenters in addition to members of the Human Factors and Ergonomics Society student chapter at the University of Virginia. ### Thursday, April 29 3:45 - 4:45 (America/New_York) #### Energy and Environment 2 Room: Energy & Environment Track Chairs: Md Mofijul Islam (University of Virginia, USA), Aya Yehia (University of Virginia, USA) 3:45 Javier Langarica, Matthew Aaron Perlow, Alexa L Solomon and Derek Ripp (The George Washington University, USA) As many institutions are moving towards decarbonization, the coal industry is slowly declining and exploring new markets. During the last three years, teams from The George Washington University and Mississippi State University have worked on a joint research program that aims to produce three new environmentally friendly products derived from coal. These patented products could potentially revitalize the coal industry. However, there is currently no existing production plant design, which is paramount to succeeding in this new market. The production plant consists of three production lines, one for each product. Every production line is unique and creates products with different properties using the same raw material. However, some processing units are used repeatedly by the same production line or even different production lines, while the project budget currently only allows the purchase and operation of one processing unit of each type. A plant design was laid out to represent the processing unit availability in the plant while meeting the specifications of each production line. The design was then modeled using the simulation software Simio to create a prototype of the real-life plant. The simulation of this model projected the production performance of the plant under the current design conditions. Three new alternative models were then simulated to explore productivity variations resulting from the addition of production silos to the plant. Production silos were demonstrated as an effective technique to increase productivity and utilization of specific processing units, while maintaining the same processing unit availability and staying within the budget. 4:00 Kevin Hoffman, Jae Yoon Sung and André Zazzera (University of Virginia, USA) In the current paradigm of planet formation research, it is believed that the first step to forming massive bodies (such as asteroids and planets) requires that small interstellar dust grains floating through space collide with each other and grow to larger sizes. The initial formation of these pebbles is governed by an integro-differential equation known as the Smoluchowski coagulation equation [1], to which analytical solutions are intractable for all but the simplest possible scenarios. While brute-force methods of approximation have been developed, they are computationally costly, currently making it infeasible to simulate this process including other physical processes relevant to planet formation, and across the very large range of scales on which it occurs. In this paper, we take a machine learning approach to designing a system for a much faster approximation. We develop a multi-output random forest regression model trained on brute-force simulation data to approximate distributions of dust particle sizes in protoplanetary disks at different points in time. The performance of our random forest model is measured against the existing brute-force models, which are the standard for realistic simulations. Results indicate that the random forest model can generate highly accurate predictions relative to the brute-force simulation results, with an R 2 of 0.97, and do so significantly faster than brute-force methods. 4:15 Cameron D Bailey (University of Virginia, USA); Samantha Garcia, Hong Liang and Kenneth Ross (UVa, USA); Julie Quinn (University of Virginia, USA) The Columbia River Power system is the country's largest renewable energy system, spanning several states and two countries. It provides one of the fastest growing regions in the continent with clean, reliable energy and protects thousands of square miles of land from flooding. The reservoirs on the Columbia River and its tributaries are responsible for many critical functions, such as flood prevention and mitigation, water quality and quantity assurance, and salmon reproduction. Despite these other objectives, the Columbia River Power system is the backbone of the region's energy supply, providing baseload when other renewable energy sources, namely wind and to a smaller extent solar, are unavailable. When hydropower cannot fill the gap, natural gas must instead, increasing reliance on fossil fuels. The objective of our project is to analyze the energy output of the Columbia River Basin across multiple different climate change and energy demand scenarios to understand the impact that each of these possible futures has on the region's ability to transition to a cleaner energy future while meeting potentially growing demands. By utilizing multiple scenarios, uncertainty around hydrometeorological and socioeconomic conditions can be quantified and addressed. In this study, we analyze outputs in the middle of the 21st century from the California and West Coast Power System (CAPOW) model, customized to reflect each climate change and energy demand combination. Energy demand scenarios are quantified by Shared Socioeconomic Pathways (SSP) and climate change scenarios by CMIP5 Representative Concentration Pathways (RCP), providing projected trends until the end of the century. By varying low, middle, and high pathways across both the SSPs and RCPs, we can gain insights into the Pacific Northwest's energy health. This research has the potential to identify shortcomings in the current energy infrastructure, project the benefits and consequences of alternative development pathways, and increase understanding of the Columbia River Power system's greatest sensitivities (climatic or socioeconomic). Future work can build off of this knowledge to design more robust reservoir operating policies in the Columbia River Basin. #### Health 2 Room: Health Track Chairs: Jenn Campbell (University of Virginia, USA), Chang Xu (University of Virginia, USA) 3:45 Mary Blankemeier, Sarah Rambo, John Radossich, Charles Thompson, Donald Brown and Marcel Durieux (University of Virginia, USA); Christian Ndaribitse (University of Rwanda, Rwanda) Five billion people, from disproportionately low and middle-income countries, are unable to access safe, timely, and affordable surgical and anesthesia care [1]. Patients in Africa are twice as likely to die after surgery when compared with the global average for postoperative deaths [2]. Given most of this mortality happens after surgery, perioperative mortality rate (POMR) has been identified by the World Health Organization as a global measure of the quality of surgical procedures. Perioperative data collected during surgery can predict adverse surgical outcomes. Access to such data is essential for decreasing mortality rates and improving medical treatment. In many low and middle-income countries, data is often manually recorded on paper flowsheets, restricting the ability to discover medical trends and inhibiting easy and efficient data aggregation and analysis. Thus, systems put in place to digitize these flowsheets are key in utilizing data to improve overall healthcare. By streamlining the digitization of intraoperative flowsheets, more data will be collected while minimizing the time while optimizing the quality. In order to optimize the digitization process, the research team has made several improvements to the current system, including a complete redesign of the digital upload process in the form of a mobile app that integrates scanner functionality and upload capability into one convenient and efficient step, thereby reducing devices and platforms required to upload. This redesign also provides increased user feedback and corrects issues in which flowsheet uploads failed. In addition, improvements were made to the SARA (Scanning Apparatus for Remote Access). SARA is a wooden box designed to standardized the distance, lighting, and background for each scan, improving readability. Possible replacement power supplies and lighting sources are being examined for durability, ease of repair, and functionality. Additionally, usability testing and evaluation was completed to measure increases in successful task completion and decreases in time and steps required. The goal of this project is to design a system to digitize the information contained in surgical flowsheets at the University Teaching Hospital of Kigali in Rwanda in the most efficient and effective manner. To accomplish this goal, the research team reduced the time and devices needed to upload a surgical sheet by 78% and 50%, respectively. Hardware and software malfunctions were fixed, and the longevity of the system was improved as procedural checklists to upkeep and correctly utilize the system were implemented. 4:00 Sihang Jiang, Kristen Maggard, Michael Porter and Heman Shakeri (University of Virginia, USA) As the ongoing outbreak of Coronavirus Disease 2019 (COVID-19) is severely affecting all over the world, analysis of the transmission of COVID-19 is of more and more interest. We focus on the application of compartmental models in the analysis of transmission of COVID-19 based on the detected viral load in wastewater and the reported number of cases. The measurement of COVID-19 RNA concentrations in primary sludge gives us information about the virus prevalence on a population level. Since the transmission of COVID-19 is a partially observed Markov process including different states, we consider a likelihood-based approach to our statistical inference to understand the inner relationship between different states and how COVID-19 actually transmits. Understanding the transmission dynamics of COVID-19 could give suggestions on public policies. #### Systems Design 2 Room: Systems Design Track Chair: Dany Dada (University of Virginia, USA) 3:45 John Beasley, Jack Burke, James Overby, Gregory Shelor, Casey Thompson and Ahmad Salman (James Madison University, USA) The idea of attending a professor's office hours seems very basic to the average college student. The beginning of each semester brings about a wave of invitations to visit each professor in their office, at the allotted time for the section of their class. Recent developments such as the growing prevalence of texting and email, as well as specific events such as the Covid-19 pandemic have brought the norm away from these in person meetings between students and professors, to the detriment of the students' education. The SmArt WhiteBoard Replacement Interactive Device (SAWBRID) is an innovative solution composed of an interactive device with a Low-power screen that, through a user friendly mobile application, makes the facilitation of office hours and the student/professor interactions outside of the classroom far more flexible and simple. In whole, the project is centered around the individual professor, their schedule, and how that schedule is communicated. The SAWBRID sits in an accessible casing outside of the professor's office, relaying information about their schedule, available time slots to be scheduled through the mobile application, and personalized messages. The device is self-updating whenever a change is detected in the professor's schedule, or when they decide to update their personalized message. The student can access a professor's schedule through the mobile application and schedule an appointment, which will place their initials in the selected time slot on both the mobile application and the SAWBRID. The professor has a different interface to interact with their SAWBRID from the mobile application giving them more control over their schedule, the personalized messages they want to display and other features. We use security services such as confidentiality and authentication throughout the system to protect user credentials, user data, and to ensure the privacy of the users. Our solution effectiveness and performance are evaluated through power measurements to determine the device's ability to self-sustain for long periods of time and the ease of use. 4:00 Chuyang Yang, Zachary A. Marshall and John H. Mott (Purdue University, USA) Mitigation of aircraft noise pollution is a core goal of managing environmental interactions and inherently aligns with the aim of reducing aviation emissions. As noise pollution adversely affects the operation and expansion of airports by restricting land zoning and flight patterns, developing precise noise generation and propagation models is imperative. Efforts to minimize the impact of aircraft noise are influenced by the accurate mapping of the geometric distribution of that noise. The Federal Aviation Administration (FAA) utilizes the Aviation Environmental Design Tool (AEDT) to model aircraft noise, emissions, and air quality consequences for regulatory compliance and system planning. Aircraft operations and fleet mix data are required when users execute AEDT to compute the noise exposure level. However, such data are difficult to obtain from non-towered airports that lack full-time air traffic facilities and personnel. Several airport operations estimation approaches developed by researchers have shown limitations in accuracy and cost-efficiency of deployment when tested, limiting their usefulness with regards to noise modeling. Meanwhile, tracking aircraft through onboard transponder data was validated as a cost-effective approach to estimate aircraft operations at non-towered airports. The authors designed a platform consisting of three modules. Flight operations and aircraft performance parameters can be estimated in the first module by deploying inexpensive hardware to collect aircraft transponder signals. The second module estimates aircraft noise levels by integrating information sources with the noise vs. power vs. distance (NPD) data from the EUROCONTROL Aircraft Noise and Performance (ANP) database. The third module visualizes airport noise impact based on a Geographic Information System (GIS) platform. Additionally, a risk assessment and a sustainability analysis are presented in this paper. 4:15 Soumya Chappidi, Laura E Gustad, Alexander Hu, Khin H Kyaw and Sara Riggs (University of Virginia, USA) Dishwashers have become an integral part of most American households and with the advent of technology, there is an opportunity to integrate smart features such as Wi-Fi connectivity, new user interfaces, and autonomous operation. The focus of our work is to inform the design of a next generation dishwasher. Here we provide research on how the smart technologies may change the paradigm in which users interact with their dishwasher and present an initial mobile prototype based on our research to improve the dishwashing experience. This study included three phases. For phase 1, we gathered data to inform the design of a mobile application prototype that included data information gathering from 31 user interviews, seven user daily-use diary interviews, and 164 questionnaires. For phase 2, we developed a mobile application prototype. For phase 3, we evaluated the mobile application prototype with 10 users. Our results thus far have shown that users are frustrated with the lack of transparency and understanding of all the functions available with their dishwasher, but have positively responded to using a mobile application to improve their understanding of their dishwasher. Overall, we believe that smart technologies have a potential to revolutionize the dishwashing experience; however, we believe that an effective UI design is critical towards that goal. 4:30 Reid Auchterlonie, Chloe Brannock, Victoria Jackson, An Luong and Kiley Weeks (University of Virginia, USA); Rupa Valdez (University of Virginia) Playgrounds can serve as an influential site in children's lives, but their designs and features often exclude those with disabilities and their social, emotional, and physical needs. This study was conducted in collaboration with Bennett's Village, a Charlottesville-based nonprofit seeking to build an inclusive playground. The purpose of the study was to investigate the needs of adolescents and young adults in the disability space and to create a materials recommendation for playground surfacing. The parameters of these analyses were established and prioritized alongside Bennett's Village. For the qualitative needs assessment, the team recruited members from organizations focused on the disability community and had 6 participants in the semi-structured interviews and 77 participants in the survey. Through qualitative content analysis of interview and open-ended survey responses and descriptive statistics analysis of close-ended survey responses, we found that, among other trends, participants viewed playgrounds as a site for community and socialization, wanted open spaces that could serve a variety of purposes, and emphasized the importance of nature. For the materials recommendation, the team created a life cycle assessment and cost-benefit analysis and found that poured-in-place (PIP) rubber was the optimal surfacing material with regard to factors such as permeability, local weather factors, and traffic/usage. These findings will be passed on to Bennett's Village to use in their design of their playground and will also contribute to future inclusive playground design broadly. #### Data 2 Room: Data Track Chairs: Faria Tuz Zahura (University of Virginia, USA), Cheng Wang (University of Virginia, USA) 3:45 Melissa Portalatin and Omer F. Keskin (University at Albany - SUNY, USA); Sneha Malneedi (Shaker High School, USA); Owais Raza and Unal Tatar (University at Albany - SUNY, USA) The imperative factors of cybersecurity within institutions have become prevalent due to the rise of cyber-attacks. Cybercriminals strategically choose their targets and develop several different techniques and tactics that are used to exploit vulnerabilities throughout an entire institution. With the thorough analysis practices being used in recent policy and regulation of cyber incident reports, it has been claimed that data breaches have increased at alarming rates rapidly. Thus, capturing the trends of cyber-attacks strategies, exploited vulnerabilities, and reoccurring patterns as insight to better cybersecurity. This paper seeks to discover the possible threats that influence the relationship between the human component and cybersecurity posture. Along with this, we use the Vocabulary for Event Recording and Incident Sharing (VERIS) database to analyze previous cyber incidents to advance risk management that will benefit the institutional level of cybersecurity. We elaborate on the rising concerns of external versus internal factors that potentially put institutions at risk for exploiting vulnerabilities and conducting an exploratory data analysis that articulates the understanding of detrimental monetary and data loss in recent cyber incidents. The human component of this research attributes to the perceptive of the most common cause within cyber incidents, human error. With these concerns on the rise, we found contributing factors with the use of a risk-based approach and thorough analysis of databases, which will be used to improve the practical consensus of cybersecurity. Our findings can be of use to all institutions in search of useful insight to better their risk-management planning skills and failing elements of their cybersecurity. 4:00 Morgan Freiberg, Kent J McLaughlin, Adinda Ningtyas, Oliver Taylor, Stephen Adams and Peter Beling (University of Virginia, USA); Roy Hayes (Systems Engineering, Inc., USA) The immensely complex realm of naval warfare presents challenges for which machine learning is uniquely suited. In this paper, we present a machine learning model to predict the location of unseen enemy ships in real time, based on the current known positions of other ships on the battlefield. More broadly, this research seeks to validate the ability of basic machine learning algorithms to make meaningful classifications and predictions of simulated adversarial naval behavior. Using gameplay data from World of Warships, we deployed an artificial neural network (ANN) model and a Random Forest model to serve as prediction engines that update as the battle progresses, overlaying probabilities over the battlefield map indicating the likelihood of the unseen ship being at each location. The models were trained and tested on gameplay data from a World of Warships tournament in which former naval officers served as commanders of competing fleets. This tournament structure ensured cohesive and coordinated naval fleet behavior, yielding data similar to that seen in real-world naval combat and increasing the applicability of our model. Both the Random Forest and ANN model were successful in their predictive capabilities, with the ANN proving to be the best method. 4:15 Pantea Ferdosian, Sean M Grace, Vasudha Manikandan, Lucas Moles, Debajyoti (Debo) Datta and Donald Brown (University of Virginia, USA) The growing field of customer experience management relies heavily on natural language processing (NLP). An important current use of NLP in this industry is to efficiently build sentiment models in new languages. These new language models will allow access to a greater range of clients. In this work, we examine the practical effectiveness and training data requirements of transfer learning methods, specifically mBERT and XLM-RoBERTa, for developing sentiment analysis models in German. To provide a meaningful comparison that excludes transfer learning, we also utilize and train an LSTM classification model. The models are tested by studying the performance gains for different amounts of target language training data. The results enable efficient building of NLP models by allowing prediction of the data requirements for a desired accuracy. 4:30 Summer S Chambers, Kaleb Shikur and Stephen A Morris (University of Virginia, USA) In this paper we adapt standard information retrieval techniques to a novel task, the mandatory regulatory review of public comments on proposed rule changes. The vast number of public comments exceeds the responsible agency's ability to manually review in the time allowed. Therefore, the agency requires an automated approach to efficiently sort and process the comments. To rank the public comments' relevance to rule sections, we implement a vector space model and compare the results to experts' reviews. We perform experiments over several indexing techniques to improve semantic relevance, splitting the regulatory document based on textual formatting, text length, and a hybrid method combining these two techniques. To improve the accuracy of our predictions, we test various synonym lists generated from a domain-specific ontology, as well as variations of standard stopword lists. By applying the relevance search as a multi-class classification problem, we find the method that most closely matches human reviews, achieving respective normalized discounted cumulative gain and mean average precision scores of 0.83 and 0.75 on our test data set. #### Optimization, Simulation, and Decision Analysis 2 Room: Optimization, Simulation, & Decision Analysis Chairs: Debajyoti (Debo) Datta (University of Virginia, USA), Mehrdad Fazli (University of Virginia, USA) 3:45 Luigi Raphael I. Dy, Kristoffer B. Borgen, John H. Mott, Chunkit Sharma, Zachary A. Marshall and Michael Kusz (Purdue University, USA) The adoption of Automatic Dependent Surveillance-Broadcast (ADS-B) transponders has given researchers the ability to capture and record aircraft position data. However, due to the ADS-B system's characteristics, missing data may occur due to propagation anomalies and suboptimal aircraft orientation with respect to the ground-based receiver. The nature of general aviation operations exacerbates this problem. As a result, it may be difficult to accurately review a general aviation aircraft's flight path with an adequate level of precision. To mitigate this, a five-dimensional modified Unscented Kalman Filter (UKF) was developed to produce statistically optimal aircraft position approximations during all flight phases. The researchers validated the UKF algorithm by comparing estimated flight paths to flight data logs from the Garmin G1000 flight instrument systems of Piper Archer aircraft used in flight training operations on February 23, 2021 at the Purdue University Airport (KLAF). Root mean square error (RMSE) was used to measure the filter's accuracy. The filter was found to accurately compensate for missing data. This research details the formulation, implementation, and validation of the filtering algorithm. 4:00 Omer F. Keskin, Nick J. Gannon, Brian Lopez and Unal Tatar (University at Albany - SUNY, USA) Vulnerability Management, which is a vital part of risk and resiliency management efforts, is a continuous process of identifying, classifying, prioritizing, and removing vulnerabilities on devices that are likely to be used by attackers to compromise a network component. For effective and efficient vulnerability management, which requires extensive resources- such as time and personnel, vulnerabilities should be prioritized based on their criticality. One of the most common methods to prioritize vulnerabilities is the Common Vulnerability Scoring System (CVSS). However, in its severity score, the National Institute of Standards and Technology (NIST) only provides the base metric values that include exploitability and impact information for the known vulnerabilities and acknowledges the importance of temporal and environmental characteristics to have a more accurate vulnerability assessment. There is no established method to conduct the integration of these metrics. In this study, we created a testbed to assess the vulnerabilities by considering the functional dependencies between vulnerable assets, other assets, and business processes. The experiment results revealed that a vulnerability's severity significantly changes from its CVSS base score when the vulnerable asset's characteristics and role inside the organization are considered. 4:15 Philip G Halsey, Charlie Putnam, Aditi Rajagopal and Keith Wilson (University of Virginia, USA); Oliver Schaer (UVa, USA) Recent changes in federal US regulations allowed debt collection agencies to expand their channels of communication from physical letters and phone calls only, to adopt digital communication channels including emails and SMS text messages. With changing demographics, debt collection companies stand to gain substantially if they can improve when and how they communicate with their account holders, both in driving more payments and in saving money through the reduced cost of the digital channels. This study explores the data provided by a leading debt collection agency on their debt holders. One of the key issues is that there is a limited understanding of the extent to which the new channels are affecting customer behaviors and payments in terms of frequency and timing. To answer these questions, we apply statistical models analyzing the impact of the various communication channels on the probability that a debtor pays and how much of their outstanding debt that they pay. Initial analysis is based on A/B testing of the various customer segments and whether their payment activity increased. We then draw insights on the channel and the timing of communication on the amount of revenue earned by the agency using an adstock-like model. Modeling communication is key to understand and better manage debt collection, increasing the likelihood of payment while indicating potential for saving costs and improve customer satisfaction. 4:30 Christopher M VanYe, Beatrice E Li, Andrew Koch, Mai Luu, Rahman O. Adekunle and Negin Moghadasi (University of Virginia, USA); Zachary A. Collier (Collier Research Systems, USA); Thomas Polmateer (University of Virginia, USA); David Barnes (Systems Planning and Analysis, USA); David Slutzky (University of Virginia, USA); Mark C. Manasco (Commonwealth Center for Advanced Logistics Systems, USA); James Lambert (UVa, USA) This paper addresses security and risk management of hardware and embedded systems across several applications. There are three companies involved in the research. First is an energy technology company that aims to leverage electric-vehicle batteries through vehicle to grid (V2G) services in order to provide energy storage for electric grids. Second is a defense contracting company that provides acquisition support for the DOD's conventional prompt global strike program (CPGS). These systems need protections in their production and supply chains, as well as throughout their system life cycles. Third is a company that deals with trust and security in advanced logistics systems generally. The rise of interconnected devices has led to growth in systems security issues such as privacy, authentication, and secure storage of data. A risk analysis via scenario-based preferences is aided by a literature review and industry experts. The analysis is divided into various sections of Criteria, Initiatives, C-I Assessment, Emergent Conditions (EC), Criteria-Scenario (C-S) relevance and EC Grouping. System success criteria, research initiatives, and risks to the system are compiled. In the C-I Assessment, a rating is assigned to signify the degree to which criteria are addressed by initiatives, including research and development, government programs, industry resources, security countermeasures, education and training, etc. To understand risks of emergent conditions, a list of Potential Scenarios is developed across innovations, environments, missions, populations and workforce behaviors, obsolescence, adversaries, etc. The C-S Relevance rates how the scenarios affect the relevance of the success criteria, including cost, schedule, security, return on investment, and cascading effects. The Emergent Condition Grouping (ECG) collates the emergent conditions with the scenarios. The generated results focus on ranking Initiatives based on their ability to negate the effects of Emergent Conditions, as well as producing a disruption score to compare a Potential Scenario's impacts to the ranking of Initiatives. The results presented in this paper are applicable to the testing and evaluation of security and risk for a variety of embedded smart devices and should be of interest to developers, owners, and operators of critical infrastructure systems. #### Infrastructure, Networks, and Policy 2 Room: Infrastructure, Networks, & Policy Track Chairs: Moeen Mostafavi (University of Virginia, USA), Samarth Singh (University of Virginia, USA) 3:45 Josh Eiland, Clare Hammonds, Sofia Ponos, Shawn Weigand and William Scherer (University of Virginia, USA) Organizations in the nonprofit space are increasingly using data mining techniques to gain insights into their donors' behaviors and motivations. Data mining can be costly but can also be valuable in retaining and obtaining donors. Throughout the course of this project, we have prioritized two objectives. One is to increase the ratio of funds raised to dollars spent on fundraising from current donors, making these efforts more profitable. The other is to determine how to most effectively solicit new donors. To accomplish these goals, we have used statistical modeling and data analysis to gain insights and create recommendations related to donor optimization and acquisition. To learn about the current donors, it is important to identify which unique traits make donors more likely to donate and whether those traits are related to an individual's demographic information or giving history. Our team is classifying donors into "states" of giving based upon different metrics, including how recently, how much, how often, and for how long they have donated. We are using various data models to create actionable recommendations on how to tailor fundraising appeals specifically to different donors, which will increase the Inn's overall donations and their return on fundraising investment. We are also mapping the transitions between these giving states so that donors dropping from higher states can be re-engaged, while donors with a high chance of moving into a more profitable state can be flagged and targeted. We will present these results in a dashboard that the Inn can use moving forward to better solicit each donor and maintain a steady fundraising revenue stream. 4:00 Branko Bokan (The George Washington University, USA); Joost Santos (George Washington University, USA) To manage limited resources available to protect against cybersecurity threats, organizations must use risk management approach to prioritize investments in protection capabilities. Currently, there is no commonly accepted methodology for cybersecurity professionals that considers one of the key elements of risk function - threat landscape - to identify gaps (blinds spots) where cybersecurity protections do not exist and where future investments are needed. This paper discusses a new, threat-based approach for evaluation of cybersecurity architectures that allows organizations to look at their cybersecurity protections from the standpoint of an adversary. The approach is based on a methodology developed by the Department of Defense and further expanded by the Department of Homeland Security. The threat-based approach uses a cyber threat framework to enumerate all threat actions previously observed in the wild and scores protections (cybersecurity architectural capabilities) against each threat action for their ability to: a) detect; b) protect against; and c) help in recovery from the threat action. The answers form a matrix called capability coverage map - a visual representation of protections coverage, gaps, and overlaps against threats. To allow for prioritization, threat actions can be organized in a threat heat map - a visual representation of threat actions' prevalence and maneuverability that can be overlaid on top of a coverage map. The paper demonstrates a new threat modeling methodology and recommends future research to establish a decision-making framework for designing cybersecurity architectures (capability portfolios) that maximize protections (described as coverage in terms of protect, detect, and respond functions) against known cybersecurity threats. 4:15 Nathaniel Donkoh-Moore, Madeline McNult, Grace Boland, Patrick Leonard and Colin Cool (University of Virginia, USA); Neal Goodloe (Jefferson Area Community Corrections, USA); Loreto Alonzi, K. Preston White and Michael Smith (University of Virginia, USA) About a third of current inmates in the United States prisons and jails suffer from severe mental illness (Collier, 2014). For most of these inmates, their untreated mental health needs contribute to their return to custody within the criminal justice system. A 2011 study reported that approximately 68% of inmates with an untreated mental illness and substance abuse diagnoses return to custody at least once within 4 years of the initial release, compared to 60% of those who do not suffer from either mental illness or substance abuse diagnoses (Bronson et al., 2017). This project extends over a decade of prior research examining current mental health services available to those released from the Albemarle-Charlottesville Regional Jail (ACRJ). The primary objective of this project was to identify individuals within the ACRJ, which serves jurisdictions in Charlottesville, Albemarle, and Nelson County who were recommended for services following screening through the Brief Jail Mental Health Screener (BJMHS) to answer questions surrounding the return to custody rate of those linked vs not linked to services. To examine the demographics of inmates screened, types of charges, and length of stay in the criminal justice system, data sets were obtained from Region Ten Community Services Board (R10), ACRJ, Offender Aid and Restoration (OAR), and the Thomas Jefferson Area Coalition for the Homeless (TJACH) after each member of the team completed a training on protecting personally identifiable information (PII) and signing a nondisclosure agreement (NDA). The research team analyzed 60 months of data spanning from July 2015 through June 2020. The data include individuals booked into ACRJ and individuals who received mental health, substance abuse, and intake/access/emergency services from R10. The data from ACRJ, the BJMHS, and R10 were merged to form a single data set According to the merged data, of the individuals who took the BJMHS when they were booked into ACRJ, 26% screened-in, meaning their BJMHS results indicated they should be referred for further mental health evaluation. The team analyzed the cohort of individuals who screened-in and were available to receive services from R10 following their release from custody. The key findings and outcomes of the study included: • From the ACRJ dataset from 2015 to 2019, 913 individuals screened-in for referral to mental health services. This is 26% of the total inmates who were screened at ACRJ. • Individuals who received services from R10 were more likely to return to custody (19%) within 12 months than screened-in individuals who did not receive these services (11%). 4:30 Gareth S Norris (George Washington University, USA); Anya Qureshi and Katelyn M Russo (The George Washington University, USA); Mariana Santander Gomez (George Washington University, USA) This paper investigates efficiency improvement opportunities within homeless service systems in the United States through modeling and simulation. Homeless service systems in the United States continue to evolve but are challenged by facility capacity and operational constraints. In this paper, a Maryland county homeless service system is selected as the case study for analysis. Data is collected through personnel interviews, Housing and Urban Development (HUD) data, and summarized annual reports from the client. Using a regression analysis model, key variables in flow-rates to stable housing solutions are determined in order to construct a system dynamics model of the homeless service system. This model is run for a period of 2 years, using the simulation software Vensim to identify bottlenecks as potential areas of improvement within the system. Model success is defined by HUD system performance measures, such as the length of time persons remain homeless and the rates at which persons placed in stable housing solutions return to homelessness. The model is further evaluated with the findings from a directed literature search of related case studies, semi-structured interviews with industry personnel, and a comparison to national best practices. The model will be generalized to simulate the HUD system performance measures of other homeless service systems in the United States. Additionally, the model will inform recommendations of identified improvements, such as altering ratios of case managers to facility occupants, modifying the intake assessment process, and optimizing facility programs for improved client flow. These recommendations will be applied to create a prototype dashboard for the client. This dashboard will be used as a forecasting tool to aid in decision-making affecting the operation of local homeless service systems. ### Thursday, April 29 7:15 - 9:15 (America/New_York) #### Data 3 Room: Data Track Chairs: Reid Bailey (University of Virginia, USA), Valerie Michel (University of Virginia, USA) 7:15 Alicia Doan, Nathan England and Travis Vitello (University of Virginia, USA) With the ubiquity of Internet-based words-of-mouth to inform decisions on various products and services, people have become reliant on the authenticity of website reviews. These reviews may be manually evaluated for publishability onto a website, however increasing volumes of user-submitted content may strain a website's resources for accurate content moderation. Recognizing the important for patients to receive authentic reviews of cosmetic surgery procedures, we considered a corpus of 523,564 user-submitted reviews to the RealSelf.com website spanning the dates of 2018-01-01 through 2020-05-31. Prior binary classifications of "published" or "unpublished" were applied to these reviews by the RealSelf content moderation team. Textual and behavioral machine learning models were developed in this study to predict the classification of RealSelf's reviews. An ensemble model, constructed from the top-performing textual and behavioral models in this study, was found to have a classification accuracy of 82.9 percent. 7:30 Jonathan A Gomez, Thomas Hartka, Binyong Liang and Gavin Wiehl (University of Virginia, USA) Wikidata is a crowd-sourced knowledge base built by the creators of Wikipedia that applies the principles of neutrality and verifiability to data. In its more than eight years of existence, it has grown enormously, although disproportionately. Some areas are well curated and maintained, while many parts of the knowledge base are incomplete or use inconsistent classifications. Therefore, tools are needed that can use the instantiated data to infer and report structural gaps and suggest ways to address these gaps. We propose a context matrix to automatically suggest potential values for properties. This method can be extended to evaluating the ontology represented by knowledge base. In particular, it could be used to propose types and classes, supporting the discovery of ontological relationships that lend conceptual identification to the content entities. To work with the large, unlabelled data set, we first employ a pipeline to shrink the data to a minimal representation without information loss. We then process the data to build a recommendation model using property frequencies. We explore the results of these models in the context of suggesting type classifications in Wikidata and discuss potential extended applications. As a result of this work, we demonstrate approaches to contextualizing recently-added content in the knowledge base as well as proposing new connections for existing content. Finally, these methods could be applied to other knowledge graphs to develop similar completions for the entities contained therein. 7:45 Maria Arango, Andrew M Hogue and Karyne Williams (University of Virginia, USA) The ability to hold police accountable for the actions of officers is contingent upon independent review of data surrounding misconduct. This research identifies gaps in information and useful features to build a guide for policing data collection and analysis for public accountability, based on the practices of other cities around the US. Our primary concern is to obtain actionable and transparent data on police activities and identify recording practices to serve Charlottesville as examples to follow. Prominent challenges include sourcing usable data, judging whether the data were comprehensive, and inconsistencies across datasets that prevented joining, and in turn, limited meaningful data analysis. Our guide also advises Charlottesville of the ethical and methodological considerations when analyzing policing data. The final guide includes: recommendations for sharing data with the public through dashboard visualization, which was informed by a multi-city scrub of open policing data portals; recommendations for feature variables to collect for meaningful analysis; and a multi-city proof-of-concept data dashboard. 8:00 John R McNulty (University of Virginia, USA); Sarai Alvarez and Michael Langmayr (UVa, USA) The Internet Archive seeks to provide "universal access to all knowledge" through their digital library, which includes a digital repository of over 475 billion crawled web documents in addition to other content. Of particular interest, to those who use their platform, is the preservation and access to research due to its inherent value. Research or scholarly work outside of mainstream institutions, publishers, topics, or languages is at particular risk of not being properly archived. The Internet Archive preserves these documents in its attempts to archive all content, however, these documents of interest are still at risk of not being discoverable due to lack of proper indexing within this uncurated archive. We provide a preliminary classifier to identify and prioritize research, to include long tail research, which circumvents this issue and enhances their overall approach. Classification is complicated by the fact that documents are in many different formats, there are no clear boundaries between official and unofficial research, and documents are not labeled. To address this problem, we focus on HTML documents and develop a semi-supervised approach that identifies documents by their provenance, structure, content, and linguistic formality heuristics. We describe a semi-supervised machine learning classifier to filter crawled HTML documents as research, both mainstream and obscure, or non-research. Because the HTML datasets were not labelled, a provenanced approach was used where provenance was substituted for label. A data pipeline was built to deconstruct HTML website content into raw text. We targeted structural features, content features, and stylistic features which were extracted from the text and metadata. This methodology provides the ability to leverage the similarities found across differing subjects and languages in scholarly work. The optimal classifier explored, XGBoost, predicts whether a crawled HTML document is research or non-research with 98% accuracy. This project lays the foundation for future work to further distinguish between mainstream and long tail research, both English and non-English. 8:15 Huilin Chang, Yihnew Eshetu and Celeste Lemrow (University of Virginia, USA) The Internet Archive (IA), one of the largest open- access digital libraries, offers 28 million books and texts as part of its effort to provide an open, comprehensive digital library. As it organizes its archive to support increased accessibility of scholarly content to support research, it confronts both a need to efficiently identify and organize academic documents and to ensure an inclusive corpus of scholarly work that reflects a "long tail distribution," ranging from high-visibility, frequently- accessed documents to documents with low visibility and usage. At the same time, it is important to ensure that artifacts labeled as research meet widely-accepted criteria and standards of rigor for research or academic work to maintain the credibility of that collection as a legitimate repository for scholarship. Our project identifies effective supervised machine learning and deep learning classification techniques to quickly and correctly identify research products, while also ensuring inclusivity along the entire long-tail spectrum. Using data extraction and feature engineering techniques, we identify lexical and structural features such as number of pages, size, and keywords that indicate structure and content that conforms to research product criteria. We compare performance among machine learning classification algorithms and identify an efficient set of visual and linguistic features for accurate identification, and then use image classification for more challenging cases, particularly for papers written in non- Romance languages. We use a large dataset of PDF files from the Internet Archive, but our research offers broader implications for library science and information retrieval. We hypothesize that key lexical markers and visual document dimensions, extracted through PDF parsing and feature engineering as part of data processing, can be efficiently extracted from a corpus of documents and combined effectively for a high level of accurate classification. #### Friday, April 30 ### Friday, April 30 8:15 - 8:20 (America/New_York) #### Welcome to Friday at SIEDS Room: Main Zoom Room ### Friday, April 30 8:30 - 9:30 (America/New_York) #### Energy and Environment 4 Room: Energy & Environment Track Chairs: Courtney C Rogers (University of Virginia, USA), Aya Yehia (University of Virginia, USA) 8:30 Emma C Kuttler, Buket Cilali and Kash Barker (University of Oklahoma, USA) The effects of climate change will lead to the forced displacement of millions and will cause dramatic changes to human settlement and migration patterns. The most vulnerable populations will travel as environmental migrants through a complicated quasi-governmental resettlement system of aid camps in the hope of finding long-term placements. These people deserve safe housing and the location they permanently settle in has critical socio-political impacts. Prior research has generally focused on post-conflict or post-disaster relief location selection for a facility at a single point in time or single-period refugee resettlement, with even less work dedicated to environmental migration. Furthermore, the scale of this work is typically limited to a city or country with the geographic area available for relocation remaining static, while in a climate change scenario the habitable land changes over time. We extend the problem of single-period resettlement to multi-period resettlement using the technique for order preference by similarity to ideal solution (TOPSIS), a straightforward multi-criteria decision-making method. We propose a method to iterate resettlement across multiple planning periods and incorporate geospatial, cultural, environmental, and capacity criteria. The set of alternatives, or destinations countries, will change with each planning period to represent the changing habitable environment. Ratios of weights between iterations remain constant. TOPSIS will produce a ranked list of destination sites. The methodology will be illustrated with a generated data set using a set of vulnerable source locations and a set of destination sites, both of which will change in each planning period. We found more variation in the rankings between periods than with standard TOPSIS, as well as greater sensitivity to weights. This work can be applied to any sort of long-term multi-criteria location selection problems (e.g., store openings and closings under changing consumer demand). 8:45 Thomas Anderson, Daniel Collins, Chloe Fauvel, Harrison Hurst, Nina Mellin, Bailey Thran, Andres Clarens and Arthur Small (University of Virginia, USA) One of the principal challenges associated with decarbonization is the temporal variability of renewable energy generation, which is creating the need to better balance load on the grid by shaving peak demand. We analyzed how innovative load-shifting technologies can be used by large institutions like the University of Virginia to shift load and support statewide efforts to decarbonize. To do this, we focused on the University's plans for expansion of the Fontaine Research Park, which is a good model for understanding how these technologies could distribute energy load behind the meter. First, we worked to develop a predictive model to forecast when peak demands will occur and understand how interventions, including heat recovery chillers and thermal storage tanks, might be used to balance load. Then, we extended a statewide energy systems model using the Tools for Energy Modeling Optimization and Analysis (TEMOA) to simulate the ways in which these types of interventions might be scaled to the whole state. Using the energy demand model in conjunction with aggregated institutional energy use data, the team evaluated the effects that broader adoption of distributed energy technologies in Virginia could have on the grid's ability to handle the energy transition. Our study showed implementing distributed energy sources on a state-scale had insignificant effect on balancing load. However, on a microgrid scale, such technologies prove to be a useful resource to decrease peak demand which would allow for further clean energy projects and possible cost reductions. 9:00 Devin P Simons, Declan R Tyranski, Zachary High and Karim Altaii (James Madison University, USA) Water scarcity is a significant and escalating issue that is currently affecting every continent on the globe. According to the United Nations, water usage has been increasing at more than twice the rate of population growth, and it is estimated that 4 billion people experience severe water scarcity during at least one month of the year [1]. Often, there is an abundance of water in the atmosphere, with nearly 12,900 cubic kilometers of water present at any time [2]. To take advantage of this water source, our solution involves the design of an Atmospheric Water Generator (AWG) that can provide water to areas with medium to high humidity. The design converts water vapor into a liquid while minimizing the amount of input energy needed. The Earth is known to have a relatively cool and constant temperature of around 55°F at shallow underground depths [3]. The design uses the ground as a thermal sink by incorporating an optimized underground heat exchanger. This is accomplished with a closed-loop system of geothermal piping configured in a helical arrangement and a circulating pump powered by a photovoltaic (PV) panel. This allows for a liquid coolant (water) to reduce in temperature and be utilized above ground. The above-ground portion of the system consists of a crossflow finned heat exchanger, which allows for the cooled water to enter and condense humidity in the air. Air at 26.7°C and 85% relative humidity is directed at a rate of 0.23 kg per second across the heat exchanger by a fan which is powered by the PV panel. The air cools as it passes the heat exchanger and allows for water to condensate, which is collected, measured, and recorded via a remote data collection system. The system generates 200 milliliters of water every hour. Parameters such as temperatures, relative humidity, flow rate, and atmospheric conditions are also collected to verify the design and to model data for other regions around the world. It is an innovative design that offers a unique solution to help alleviate water scarcity. #### Health 4 Room: Health Track Chairs: Jenn Campbell (University of Virginia, USA), Md Mofijul Islam (University of Virginia, USA) 8:30 John Bullock, Megan Grieco, Yingzheng Li, Ian Pedersen, Benjamin Roberson, Gracie Wright and Loreto Alonzi (University of Virginia, USA); Michael McCulloch (University of Virginia Children's Hospital, USA); Michael Porter (University of Virginia, USA) There is substantial need to increase donor heart utilization in pediatric heart transplantation. Almost half of pediatric heart donors are discarded, despite nearly 20% waitlist mortality. Physicians have limited time to view heart condition data and decide to accept the donor heart once the heart becomes available. Due to the large amount of data associated with each donor heart and the lack of data-driven guidelines, physicians often do not have adequate metrics to determine acceptable heart quality. This research characterizes the differences in the clinical course between accepted and rejected pediatric donor hearts. A longitudinal study assessing the effect of static and dynamic measurements on the donor heart's function from the time of declaration of brain death to either disposal or heart procurement is developed by analyzing donor data via DonorNet, the system used by the United Network for Organ Sharing (UNOS) to match donors to a ranked order of recipients based on blood type, heart size, urgency status of the recipient, and other factors. Cardiovascular milieu (i.e. blood pressure, heart rate, medical management) and surrogate markers of organ perfusion, such as kidney and liver function, also inform our analyses and determine whether there are direct or indirect associations between these myriad markers and heart function. It also analyzes the proportion of measurements in stable and acceptable ranges over time, as well as typical minimum, maximum, and final measurements for different functions. All analyses are compared between accepted and rejected hearts using logistic regression and statistical analysis. Using the most recent measurements for each donor at 24 hours after brain death, the analysis identified significant factors in predicting donor heart acceptance: Left Ventricular Valve Dysfunction, Age, Shortening Fraction, and 4 Chamber Ejection Fraction. Additionally, visual tools were created as deliverables to aid physicians to decrease decision time and increase confidence in donor heart acceptance or rejection. 8:45 Gunnar Sundberg and Bayazit Karaman (Florida Polytechnic University, USA) The diversity in responses to and conditions resulting from the COVID-19 pandemic in the United States has provided rich data for researchers to study, especially as the pandemic continues to progress. With more than a full year of data available in different regions and at different granularities, methods of analysis requiring larger datasets are now worth examining or refining. Furthermore, as the United States seeks to move away from national and state-wide policies into approaches focused on individual communities, open data must be provided at both the state and county levels. In this paper, a comprehensive database encompassing COVID-19 data and a large body of related data is proposed. The database includes data on cases and deaths, testing, mobility, demographics, weather, and more at both the US state and county levels. The system was implemented using the Python framework Django and the high-performance RDBMS PostgreSQL. A data-processing pipeline was implemented using the asynchronous task library Celery to gather and clean data from various verified sources. This database has been used to build a web application for concise reporting and an open API for public access to the data. A reference web application using the API is currently available at www.bigdatacovid.com, and the API is available at www.bigdatacovid.com/api/v1, with API documentation available on the website. 9:00 Colleen B Callahan and Holden Bridge (University of Virginia, USA) The United States Department of Defense (DoD) routinely seeks more efficient ways to examine genetic data applied to cases of foreign or domestic crime. The process of identifying biogeographic ancestry groups using forensic DNA data to provide investigative leads is currently performed on Single Nucleotide Polymorphisms (SNP). The motivation for this project was to determine whether SNP assessment of biogeographic ancestry can be replicated using analysis of autosomal Short Tandem Repeats (STR) while preserving predictive accuracy. Replacing SNP analysis with STR analysis is theoretically more efficient. STR data can be generated from a significantly smaller amount of DNA. Additionally, readily available genetic data can be analyzed well after collection. Moreover, in contrast to SNP analysis, STR analysis is more cost effective per sample. Several considerations for this paper were necessary: 1) Whether or not STR profiles at 24 loci can be distinguished into distinct clusters using microvariants and off-ladder alleles. 2) Given that there is identifiable clustering, whether or not these clusters can be probabilistically identified as members biogeographic ancestry groups. STR profiles consisting of 24 loci from N=2,348 subjects were analyzed. The present analysis employed multidimensional scaling (MDS), which provides a measure of dissimilarity between STR profiles and reduces the tabular profiles into two latent dimensions. Using the scaled MDS coordinates, a Gaussian Mixture Model (GMM) was constructed which provides probabilities of belongingness for every data point to each cluster. Results from the model indicated separations between certain biogeographic ancestry groups with the probabilities generated from the GMM providing a posteriori confidence levels for group membership. Such analyses may be of benefit for efforts in future crime investigation where biogeographic ancestry identification is needed. 9:15 Marissa Shand, Joseph Manderfield, Surbhi Singh and Clair E McLafferty (University of Virginia, USA) Crohn's Disease (CD) diagnosis is a constant challenge for clinicians. Even with extensive magnetic resonance enterography (MRE) scans, identifying tissue damaged by CD can still be difficult, even for experts. Deep learning approaches for medical applications have recently gained traction as tools to complement radiologist consultation. Computer-aided diagnosis can potentially save time and labor resources spent on routine manual diagnosis. For imaging of the gastrointestinal tract, these cutting-edge techniques could help distinguish subtle structures indicative of Crohn's Disease (CD) that are not visible to the human eye. In this paper, we explore existing segmentation and neural network approaches more traditionally used for non-medical imaging and compare their diagnostic potential for identifying CD from MRE images. #### Systems Design 4 Room: Systems Design Track Chairs: Dany Dada (University of Virginia, USA), Fathima Rifaa Pakkir Mohamed Sait (University of Virginia, USA) 8:30 Michael Shane Flynn (The University of Alabama in Huntsville, USA); Hannah M Barr, Kristin Weger and Bryan Mesmer (University of Alabama in Huntsville, USA); Robert Semmens and Douglas L Van Bossuyt (Naval Postgraduate School, USA); Nathan Tenhundfeld (University of Alabama in Huntsville, USA) The advancement of information technology has increased the prevalence of autonomous systems within day-to-day activities. Autonomous systems save time for users, performing set tasks with increased speed and efficiency while simultaneously providing financial benefits. However, one of the biggest issues faced by designers and decision makers is the acceptance and adoption of such technologies. As such, research has pulled from an established body of literature regarding incentive mechanisms, in order to motivate users to accept and adopt automated and autonomous systems. The object of this paper is to provide a brief literature review on incentive mechanisms, and subsequently provide design ideas for their inclusion. We will compare financial, social/reputation, and gamification-based incentive mechanisms and their relative efficacy on the changing of user behavior. Finally, we provide some avenues for future research that we believe to be particularly important which could prove fruitful. 8:45 Kundan Paudyal and Cameron MacKenzie (Iowa State University, USA) Today's supply chains must be flexible, adaptable, and agile to respond quickly to customer demands. However, the supply chain is often neglected when a company is designing a new product. This paper outlines and explains how integrating supply chain design into the product design phase can improve engineering design and the supply chain logistics. First, the paper reviews the literature to discover the best practices for companies seeking to integrate supply chain into the design process. The examples of individual companies that adapted some of these techniques are provided. Second, we create a simulation of a decision model to analyze and quantify the benefits of designing the supply chain concurrently with designing a product. The simulation suggests that integrating supply chain design into the design phase can often lead to lower overall costs. The benefits of this integration increase if the costs of design and costs of supply chain are highly correlated. 9:00 Erin Hopkins (University of Virginia, USA); Jackie Mazzeo (UVa, USA); Vinh Nguyen, Emma Peck and Kelcie Satterthwaite (University of Virginia, USA); Carlos Lidón (King Digital Entertainment, Spain); Gregory Gerling (University of Virginia, USA) Design systems are increasing in popularity, created to ensure a consistent aesthetic of graphics and interactions in websites and apps, and guide product development. They tend to include a set of standards, principles, and documentation. Due to their often-rigid requirements on structure and uniformity, traditional design systems can discourage creativity and customization. To forge a balance, the work develops a criteria-based evaluation tool, or 'scorecard' for assessing design components that incorporate principles of consistent, standardized practice, yet prioritize creative freedom. The evaluation scorecard allows inconsistencies to be managed in a collaborative and consensus-based manner. Users select parameters and metrics to evaluate the various elements of a design component. The tool calculates a score based on the number of parameters passed or failed. Scores below team-desired thresholds signal a need for further modification or redesign. Usability feedback using talk-aloud and surveys in a focus group format assess the ease of use and efficiency of the tool and identify gaps in functionality. #### Data 4 Room: Data Track Chair: Cheng Wang (University of Virginia, USA) 8:30 Vivian Austin, Zachary T McLane, Caroline O'Keeffe, Diyar Rashid and Ariana Zimmerman (University of Virginia, USA); Diana Franco Duran (Uva, USA); Arsalan Heydarian (University of Virginia, USA); Todd Bagwell (Hourigan, USA) Construction projects of all kinds are plagued by inefficiencies, creating excess risk, and leading to delays and cost overruns. Existing research has focused on analyzing delays in completed construction projects for forensic claims disputes. However, this data could also be used to decrease the risk of future schedule delays through the use of predictive trend modeling and data analysis. This form of data analytics is becoming increasingly prevalent and valuable in the construction industry as a means of identifying and allowing for the prevention of potential delays. An interdisciplinary team at the University of Virginia (referred to as the capstone team) seeks to provide insight into delay causation and prevention for Hourigan, a general contracting and construction firm. This work focuses on the analysis of scheduling data and project teams' input from three medium-sized construction projects recently completed by Hourigan, referred to by the placeholder names projects A, B, and C. These data sets were interpreted using statistical analyses to assess correlations between owner, designer, or contractor-related delays and frequent delays. Interviews with the project team for each Hourigan project were conducted to obtain qualitative data regarding specific delay events. The main causes of delay for Project A were found to be the owner and designer; for Project B the designer and subcontractors; and for Project C the subcontractors, materials, and external factors. The capstone team also identified that Hourigan would benefit from recording more data related to project schedules as well as costs incurred due to specific delays. These findings will allow Hourigan to better manage, avoid, and overcome future challenges due to project delays. 8:45 Ronith Ranjan, Kasra Lekan and Vinay Bhaip (University of Virginia, USA) Open data, the distribution of universally available datasets, fosters transparency and accountability to serve the stakeholders of a community. Universities act as hubs of innovation and discovery where any individual can collaborate and freely explore new ideas in their endeavors to better the world. At many universities, including the University of Virginia (UVA), the incongruous communication of data between all stakeholders, and most especially its lack of clarity to students, contributes to student disengagement that threatens to compromise the vision of a transparent and effective university. Open data initiatives at colleges remain an underused tool to target campus improvement and to empower the next generation of civic-minded student leaders. This paper seeks to explore the strengths and needed improvements in current open data principles and projects throughout cities and colleges in the United States. Herein the authors will develop a framework for building an open data initiative at the University of Virginia through evaluating the best practices from similar projects. This research will directly lead into a student-led initiative to develop the Open Data Platform at UVA and will form the foundation of the principles and lessons to be applied at UVA and other similar open data projects elsewhere. 9:00 Kevin Finity, Ramit K Garg and Maxwell McGaw (University of Virginia, USA) Campaign speeches provide significant insight into how candidates communicate their message and highlight their priorities to various audiences. This study explores the campaign speeches of Donald Trump, Joseph Biden, Michael Pence, and Kamala Harris during the 2020 US presidential election using Natural Language Processing (NLP) techniques and a novel data pipeline of unstructured automated video captions. The intent of this effort is to evaluate the stylistic elements of the candidate speeches through elements such as formality, repetitiveness, topic variance, sentiment, and vocabulary choice/range to establish how candidates differ in their approaches and what effectively resonates with the voters. The NLP methods used include unsupervised similarity and clustering algorithms. Through this work, the results uncovered large stylistic differences amongst the candidates overall; however, more notably also indicate stark differences between the top and bottom of the Republican ticket compared to the Democratic ticket. The findings support the idea that the candidate pairs were selected strategically to cover the largest bloc of voters possible as part of the election process. 9:15 Pavan Kumar Bondalapati, Pengwei Hu, Shannon E Paylor and John Zhang (University of Virginia, USA) Research on the origins of planets and life centers around protoplanetary disks and protostars, for which the Atacama Large Millimeter/sub-millimeter Array (ALMA) has been revolutionary due to its ability to capture high-resolution images with exceptional sensitivity. Astronomers study these birthplaces of planets and their properties, which determine the properties of any eventual planets. The ALMA science archive contains over a petabyte of astronomical data which has been collected by the ALMA telescope over the last decade. While the archive data is publicly available, manually searching through many thousands of unlabelled images and ascertaining the type and physical properties of celestial objects is immensely labor-intensive. For these reasons, an exhaustive manual search of the archive is unlikely to be comprehensive and creates the potential for astronomers to miss objects that were not the primary target of the telescope observational program. We develop a Python package to automate the noise filtration process, identify astronomical objects within a single image, and fit bivariate Gaussians to each detection. We apply an unsupervised learning algorithm to identify many apparently different protostellar disk images in a curated ALMA data set. Using this model and the residuals from a bivariate Gaussian fit, we can flag images of an unusual nature (e.g. spiral, ring, or other structure that does not adhere to a bivariate Gaussian shape) for manual review by astronomers, allowing them to examine a small subset of interesting images without sifting through the entire archive. Our open-source package is intended to assist astronomers in making new scientific discoveries by eliminating a labor-intensive bottleneck in their research. #### Infrastructure, Networks, and Policy 4 Room: Infrastructure, Networks, & Policy Track Chairs: Moeen Mostafavi (University of Virginia, USA), Samarth Singh (University of Virginia, USA) 8:30 Anna Madison, Abigail Arestides, Stephen Harold, Tyler Gurchiek and Kai Chang (United States Air Force Academy, USA); Anthony Ries (ARL, USA); Nathan Tenhundfeld (University of Alabama in Huntsville, USA); Elizabeth Phillips (George Mason University, USA); Ewart de Visser (United States Air Force Academy, USA); Chad C Tossell (USAF Academy, USA) With the increased availability of commercially automated vehicles, trust in automation may serve a critical role in the overall system safety, rate of adoption, and user satisfaction. We developed and integrated a novel measurement system to better calibrate human-vehicle trust in driving. The system was designed to collect a comprehensive set of measures based on a validated model of trust focusing on three types: dispositional, learned, and situational. Our system was integrated into a Tesla Model X to assess different automated functions and their effects on trust and performance in real-world driving (e.g., lane changes, parking, and turns). The measurement system collects behavioral, physiological (eye and head movements), and self-report measures of trust using validated instruments. A vehicle telemetry system (Ergoneers Vehicle Testing Kit) uses a suite of sensors for capturing real driving performance data. This off-the-shelf solution is coupled with a custom mobile application for recording driver behaviors, such as engaging/disengaging automation, during on-road driving. Our initial usability evaluations of components of the system revealed that the system is easy to use, and events can be logged quickly and accurately. Our system is thus viable for data collection and can be used to model user trust behaviors in realistic on-road conditions. 8:45 Cassidy J Anderson, William J Hinkle, Lachlan SC Hudson, Ethan Keck, Trevor R Kraeutler, Zach Wenzler, Adam J Zahorchak and Jacquelyn Nagel (James Madison University, USA) From January 2010 to November 2019, derailments in the United States have cost railroad companies over350,000,000 in damages in addition to the costly environmental remediation cleanups and injuries. The State of Virginia is home to many small scale railroads that have less than a total of 100 miles of track between destinations. Manual inspection is common, but inspectors can miss small details, leading many railroads to supplement with autonomous inspection. Many small scale railroads do not have the resources to autonomously inspect their tracks and cannot afford the cost of large scale railroad inspection equipment in addition to the resources needed to run these inspection systems. Surface level defects are the most common reason for derailment and can lead to serious damage to the train and track if left untreated. The ultimate goal of the project is to prevent or greatly decrease the likelihood of train derailment by focusing on the detection of surface level defects on rails, which will have an impact on all railroads while especially helping local short line railroads. By creating an inspection system that works for all railroad companies, anyone can more accurately and precisely find surface level defects than current manual inspections. Working with industry experts, the most common and dangerous surface level defects and the appropriate methods of detection were determined. The resulting solution is the design of a small autonomous rail cart that takes images of the track from a video feed and sorts them into "good" or "bad" photos using machine learning. Locations with "bad" rail can then be re-inspected by manual inspectors to determine the severity of the problem and work towards fixing it. This system will identify and alert users to dangerous levels of rail damage while providing a cheaper inspection alternative for smaller railroads. It also acknowledges and takes advantage of the expertise and abilities of trained manual inspectors to make the final decision.
9:00
Thomas R Gresham, Joshua Kim, James McDonald, Nick Scoggins, Moeen Mostafavi, B. Brian Park, Michael Porter, Michael E Duffy and Sandra A Smith (University of Virginia, USA)
In scores of vehicle fleets, telematic tracking systems provide fleet managers with information regarding energy consumption, the obedience of safety regulations and driver performance. For a University's Facilities Management (FM) Fleet to take the next steps towards an elevated Sustainable Fleet accreditation and overall team performance, the management has recognized the importance of effective energy and safety tracking methods combined with data analytics and a comprehensive systems analysis in order to aid the reinforcement, training and maintenance of safe and sustainable driving practices by fleet drivers. This paper outlines the design of a unique safety and eco-driving training program that will prompt University FM drivers to reflect on, educate and develop mindful driving habits which reduce environmental impact, cost and risk. We analyzed historical driver behavior data, including idling time, harsh acceleration, crash incident details, resulting in the identification of risk factors and areas for significant improvement. Educational aspects of the training program were influenced by focus groups, interviews administered with industry experts, and professional fleet training modules. The efficacy of the customized training program has been assessed through a statistical evaluation of telematic data collected before and after the training program was delivered. The results indicate that agency-specific mindful driving training was statistically significant in improving five of the six behavioral metrics measured - idling time, seat belt usage, speeding, hard acceleration, and hard braking - compared to the control group.
9:15
Seanna Adam, Caroline Glazier, Brian Coward, Grayson DeBerry, Evan Magnusson and Mehdi Boukhechba (University of Virginia, USA)
The goal of this work is to investigate novel proximity detection techniques by researching and testing various sensor technologies and investigate their feasibility in an athletic context. COVID-19 has challenged sports teams to come up with reasonable and easy-to-implement solutions to provide a safe training environment for their players and staff. For this reason, proximity data is more important than ever, as many teams are in need of a way to measure social distancing and maintain contact tracing of their athletes. Bluetooth has been widely used to detect colocation and monitor social distancing. However, there are many other sensing technologies that may prove to be more accurate, robust, and secure. Therefore, the focus of this work is to investigate how Bluetooth compares with ultra-wideband and ultrasound technologies when monitoring the distance between users. We have implemented and compared the three modalities in a controlled experiment to investigate their accuracy at detecting distance between users at various levels. Our results indicate that the UWB signals are the most accurate at monitoring co-location. This is in-line with previous research suggesting that Bluetooth cannot accurately measure the distance between fast moving objects and needs about 20 seconds to stabilize distance measurements; therefore, it is not feasible to use for sports. In addition, we recorded that UWB models yielded an accuracy of over 95%, while ultrasound correctly classified the observations over 80% of the time, and Bluetooth had an accuracy of less than 50% when predicting if a given signal is within 6 feet or not.

### Friday, April 30 9:45 - 10:45 (America/New_York)

#### Energy and Environment 5

Room: Energy & Environment Track
Chairs: Jay Fuhrman (University of Virginia, USA), Aya Yehia (University of Virginia, USA)
9:45
Christina Berger, Kayleigh J Calder, Sarah Cassway and Caroline Walton (The George Washington University, USA)
Energy management tools have become essential for high occupancy buildings because they allow building managers to understand their building's energy efficiency and identify areas of environmental waste. However, a published and customizable energy management tool standardized for all high occupancy buildings currently does not exist, requiring individual companies to spend time and money creating one that more accurately models their building's energy efficiency. In this project, we developed the prototype Dean Dashboard, a customizable energy management tool that analyzes a building's energy usage and optimizes the cost to meet a goal of obtaining the Leadership in Energy and Environmental Design (LEED) Operation and Maintenance (O&M) certification. This project uses an M.C. Dean operated-and-maintained facility as the case model. M.C. Dean is a design-build company for mission-critical facilities. The Dean Dashboard is comprised of five key features: exponentially weighted moving average (EWMA) forecasting models, EWMA control charts, energy efficiency metric calculations, a LEED score optimization model, and a system improvement analysis. Through the integration of these features, the dashboard provides detailed insights into a building's energy consumption and provides recommendations for how the building can improve its energy efficiency.
10:00
Jacob L Rantas, David Wang and William E Jarrard (University of Virginia, USA); James R Sterchi (Submission, USA); Alan Wang, Mahsa Pahlavikhah Varnosfaderani and Arsalan Heydarian (University of Virginia, USA)
This project seeks to investigate the under addressed issue of indoor environmental quality (IEQ) and the impacts these factors can have on human health. The recent COVID-19 pandemic has once again brought to the forefront the importance of maintaining a healthy indoor environment. Specifically, the improvement of indoor air flow has shown to reduce the risk of airborne virus exposure. This is extremely important in the context of hospitals, which contain high concentrations of at-risk individuals. Thus, the need to create a healthy indoor space is critical to improve public health and COVID-19 mitigation efforts. To create knowledge and provide insight on environmental qualities in the hospital setting, the authors have designed and built an interface to deploy in the University of Virginia Hospital Emergency Department (ED). The interface will display room-specific light, noise, temperature, CO2, humidity, VOC, and PM2.5 levels measured by the low-cost Awair Omni sensor. These insights will assist ED clinicians in mitigating disease-spread and improving patient health and satisfaction while reducing caregiver burden. The team addressed the problem through agile development involving localized sensor deployment and analysis, discovery interviews with hospital clinicians and data scientists throughout, and the implementation of a human-design centered Django interface application. Furthermore, a literature survey was conducted to ascertain appropriate thresholds for the different environmental factors. Together, this work demonstrates opportunities to assist and improve patient care with environmental data.
10:15
Henry C Quach, Hannah Hiscott, Harrison Mazanec and Sahil B Mehta (University of Virginia, USA)
Small island developing states (SIDS) are extremely susceptible to the damages brought upon by intensifying climate change such as hurricanes and typhoons whose intensities have been exacerbated by higher storm surges due to sea level rise and by more intense winds due to higher ocean surface temperatures in the tropics. Hurricanes can severely damage domestic food production on SIDS while simultaneously compromising the infrastructure for food imports. According to the Food and Agriculture Organization of the United Nations (2017), almost every SIDS imports over 60% of their food supply and over 50% of SIDS imports over 80%. Compounding with the damaging effects of hurricanes, competition for water resources in SIDS poses a significant challenge which is exacerbated by increasing population density and intensifying climate change. 71% of SIDS are at risk of water scarcity while 73% of SIDS are at risk of groundwater pollution. This presents a new challenge for equitable access to freshwater resources that will also adversely affect the global food system. As such, SIDS present a case study of a region that is extremely vulnerable to food insecurity due to intensifying climate change, decreasing amounts of arable land, decreasing availability of freshwater resources, and increasing global population, which all threaten the global food system (supply, production, processing, distribution, and consumption) due to disruptions in conventional crop cultivation (CCC). Our goal is to assess the potential for Microgrid Supported Open Hydroponic Crop Cultivation (MSOHCC) to be an effective complement to current food security initiatives in SIDS. As part of this overarching goal, we will start by determining how Hydroponic Crop Cultivation (HCC) in general can be an alternative to CCC in providing food security. We will then determine how MSOHCC can promote sustainable agriculture specifically in SIDS by providing climate resilience and energy efficient solutions. We will finally determine how MSOHCC can deliver economic opportunity to local SIDS economies by giving local residents the ability to produce locally grown food. The project team will grow lettuce seeds in a prototype MSOHCC unit that is powered by a solar panel. The growing conditions will be akin to those of the conditions that may be encountered in SIDS. The results will be compared to those of common lettuce yields from CCC methods to see if MSOHCC can be used as an alternative and/or as a supplement to CCC. For the MSOHCC unit itself, the team will measure the amount of lettuce harvested (kg), water used (L), energy used (kW), and land area utilized (sq. m). These results will be compared to those of lettuce yields from CCC. The project team will assess the environmental, social, and commercial viability of MSOHCC in SIDS-specifically using the Bahamas as a case study-by analyzing the conditions within the Bahamas and evaluating them based on criteria that measure different indices of performances. We expect to have a finished "score card" that evaluates the Bahamas's, and by extension SIDS's, capacity to utilize MSOHCC. Depending on the finished results, the viability of MSOHCC will be determined.

#### Health 5

Room: Health Track
Chairs: Jenn Campbell (University of Virginia, USA), Md Mofijul Islam (University of Virginia, USA)
9:45
Nikki Aaron, Prabhjot Singh, Siddharth Surapaneni and Joseph Wysocki (University of Virginia, USA)
Cancer genomics has been focused primarily on identifying and studying mutations that are over-represented in known genes. This project applied methods to scan through entire chromosomes and label these loci as "genomic probabilistic hotspots" (GPHs). A GPH is defined as any area on a patient's chromosome where the observed rate of mutations over positions of a given chromosome window far exceeds what would be expected from random variation. The approach is then applied to 39 patients diagnosed with large granular lymphocyte (LGL) leukemia - a rare form of blood cancer. In order to calculate expected mutation rates in non-LGL patients, data were obtained from the 1000 Genome Project. A negative binomial test was employed to isolate specific GPHs where the distribution of mutations within the LGL patient sample was significantly high. The Negative Binomial approach identified a median of 1 to 2 patient hotspots per chromosome with a mean Jaccard's distance between patients being 0.90. The KDE method found a median of 40 hotspots with wider span resulting in a mean Jaccard's distance of 0.43. The results from the Negative Binomial approach indicated heterogeneity between hotspot locations, whereas KDE results were more homogeneous. Negative binomial is best for pinpointing the most significantly dense regions, whereas KDE is best for identifying all broad regions that are more mutated than a reference. These new, gene-agnostic approaches provide novel methods to search chromosomes for mutational abnormalities and can be generalized and scaled to any clinical syndrome. Future directions include extension of the GPH method across genomes, developing a robust library of disease- and/or model species-specific hotspot profiles. These may serve as reference guides in studies seeking to understand the exact biochemical processes driving the onset and progression of rare cancers.
10:00
Jae Hyun Lee (University of Virginia, USA)
Coronavirus disease 2019 (COVID-19) has become a part of our everyday life in the year of 2020. Many people have turned to online social media platforms to share what they think and how they feel about the sudden impact the pandemic has brought upon us. This project aims to study public attitudes toward COVID-19 on Twitter, a popular social network platform. In particular, it focuses on discovering what issues around COVID-19 people are discussing, why they are interested in such topics, and how their emotions have evolved over time. The study further seeks to reveal potential associations between the breakout and any hidden idea previously unknown to the general public. The dataset was created by collecting approximately 150,000 tweets with keywords or hashtags related to COVID-19 over a course of four weeks with Python and Twitter API. A comprehensive analysis of the tweets was performed using natural language processing methodologies including topic modeling, sentiment analysis, and word embedding. The results suggest that many people may be failing to practice appropriate safety measures to stop the spread, despite their high interests in the COVID-19 crisis. In other words, their proactive online actions are not influencing their offline, real-life behaviors.
10:15
Rachel Filderman and Buckley Dowdle (University of Virginia, USA); Youssef Abubaker (2020 Treetop Drive, USA); David A Vann (University of Virginia, USA)
Large granular lymphocyte (LGL) leukemia is a rare, chronic leukemia associated with clinical manifestations of anemia (RBC < 4.5 million/mcL in males, < 4 million/mcL in females), neutropenia (ANC < 1500/mm 3 ), and autoimmune disease. Progress has been made in identifying a significant and frequent mutation in the STAT3 gene among LGL patients; however, STAT signaling is still largely unexplained in about 60% of those LGL leukemia patients lacking STAT3 mutations. This paper sought to confirm previous studies regarding the association of a STAT3 mutation with clinical manifestations, as well as search for other significant mutations across the rest of the genome in order to determine whether the specific clinical features of autoimmune disease, anemia, and neutropenia present in LGL leukemia patients are associated with additional genomic mutations in LGL cells. As LGL leukemia is rare, presents heterogeneous conditions, and does not have a high mutation burden, our approach is distinct from standard approaches in cancer research where mutation rates are much higher. Methods of dimension reduction are employed in tandem with association analysis and decision trees to search for signals between significant genetic mutations and clinical manifestations of anemia, neutropenia, and autoimmune disease within the LGL patient sample. Results indicate an association exists between anemia and concurrent mutations in STAT3 and TTN (p = 0.03) in T-LGLL patients. Additionally, an association was identified between neutropenia and a mutation in either TTN (p = 0.049) or STAT3 (p = 0.03) in T-LGLL patients as well. These findings imply that TTN may be responsible for STAT activation in combination with a STAT3 mutation or independently in T-LGLL patients. Through XGBoost, 66% accuracy was achieved in predicting neutropenia and 55% accuracy in predicting anemia using gene mutations as the predictor variables. However, the relatively small sample size (N=116 patients), presents concerns of limited statistical power, and the expectation on the number of times these findings might be repeated in independent samples. The ideal sample size needed for an association test to have adequate statistical power was examined. Additionally, a review of past LGL leukemia publications was undertaken to compare the statistical power of their reported analyses. To obtain satisfactory statistical power in analyzing the association between a STAT3 gene mutation and neutropenia in the T-LGLL population (e.g. p <= 0.01, power = 0.9), the T-LGLL sample size must be at least 312 patients. This sample size exceeds not only that of the present study but also that in the majority of sample sizes in the LGLL literature. The pooling of extant LGLL datasets as well as the undertaking of new, major multi-site trials is, therefore, warranted.
10:30
Andrew J. Graves, Cory Clayton, Joon Yuhl Soh, Gabriel Yohe and Per B. Sederberg (University of Virginia, USA)
Brain Computer Interfaces (BCI) decode electroencephalography (EEG) data collected from the human brain to predict subsequent behavior. While this technology has promising applications, successfully implementing a model is challenging. The typical BCI control application requires many hours of training data from each individual to make predictions of intended activity specific to that individual. Moreover, there are individual differences in the organization of brain activity and low signal-to-noise ratios in noninvasive measurement techniques such as EEG. There is a fundamental bias-variance trade-off between developing a single model for all human brains vs. an individual model for each specific human brain. The Robust Shared Response Model (RSRM) attempts to resolve this trade-off by leveraging both the homogeneity and heterogeneity of brain signals across people. RSRM extracts components that are common and shared across individual brains, while simultaneously learning unique representations between individual brains. By learning a latent shared space in conjunction with subject-specific representations, RSRM tends to result in better predictive performance on functional magnetic resonance imaging (fMRI) data relative to other common dimension reduction techniques. To our knowledge, we are the first research team attempting to expand the domain of RSRM by applying this technique to controlled experimental EEG data in a BCI setting. Using the openly available Motor Movement/ Imagery dataset, the decoding accuracy of RSRM exceeded models whose input was reduced by Principal Component Analysis (PCA), Independent Component Analysis (ICA), and subject-specific PCA. The results of our experiments suggest that RSRM can recover distributed latent brain signals and improve decoding accuracy of BCI tasks when dimension reduction is implemented as a feature engineering step. Future directions of this work include augmenting state-of-the art BCI with efficient reduced representations extracted by RSRM. This could enhance the utility of BCI technology in the real world. Furthermore, RSRM could have wide-ranging applications across other machine-learning applications that require classification of naturalistic data using reduced representations.

#### Systems Design 5

Room: Systems Design Track
Chairs: Dany Dada (University of Virginia, USA), Fathima Rifaa Pakkir Mohamed Sait (University of Virginia, USA)
9:45
Zachary Yorio, Samy S. El-Tawab and M. Hossain Heydari (James Madison University, USA)
Contact tracing has become a vital practice in reducing the spread of COVID-19 among staff in all industries, especially those in high-risk occupations such as healthcare workers. Our research team has investigated how wearable IoT devices can alleviate this problem by utilizing 802.11 wireless beacon frames broadcasted from pre-existing access points in a building to achieve room-level localization. Notable improvements to this low-cost localization technique's accuracy are achieved via machine learning by implementing the random forest algorithm. Using random forest, historical data can train the model and make more informed decisions while tracking other nodes in the future. In this project, employees' and patients' locations while in a building (e.g., a healthcare facility) can be time-stamped and stored in a database. With this data available, contact tracing can be automated and accurately conducted, allowing those who have been in contact with a confirmed positive COVID-19 case to be notified and quarantined immediately. This paper presents the application of the random forest algorithm on broadcast frame data collected in February of 2020 at Sentara RMH in Harrisonburg, Virginia, USA. Our research demonstrates the combination of affordability and accuracy possible in an IoT beacon frame-based localization system that allows for historical recall of room-level localization data.
10:00
Gaurav Anand, Arshiya Ansari, Beverly Dobrenz, Yibo Wang, Brandon Jacques and Per B. Sederberg (University of Virginia, USA)
Brain Computer Interface (BCI) applications employ machine learning to decode neural signals through time to generate actions. One issue facing such machine learning algorithms is how much of the past they need to decode the present. DeepSITH (Deep Scale-Invariant Temporal History), is a deep neural network with layers inspired by how the mammalian brain represents recent vs. less-recent experience. A single SITH layer maintains a log-compressed representation of the past that becomes less accurate with older events, unlike other approaches that maintain a perfect copy of events regardless of how far in the past they occurred. By stacking layers of this compressed representation, we hypothesized that DeepSITH would be able to decode patterns of neural activity from farther in the past and combine them efficiently to guide the BCI in the present. We tested our approach with the Kaggle "Grasp and Lift challenge" dataset. This motor movement dataset has 12 subjects, 10 series of 30 grasp and lift trials per subject, with 6 classes of events to decode. We benchmark DeepSITH performances on this dataset against another common machine learning technique for integrating features over extended time scales, long short-term memory (LSTM). DeepSITH reproducibly achieves higher accuracy in predicting motor movement events than LSTM, and also takes significantly fewer epochs and less memory to train, in comparison to LSTM. In summary, DeepSITH can efficiently process more data, with increased prediction accuracy and learning speed. This result shows that DeepSITH is an advantageous model to consider when developing BCI technologies.

#### Data 5

Room: Data Track
Chair: Cheng Wang (University of Virginia, USA)
9:45
Jay Choi (School of Data Science, USA); Brian A Foster-Pegg, Joel A Hensel and Oliver Schaer (University of Virginia, USA)
With the development of graph databases, organizations can utilize this technology to enhance human capital allocation by better understanding and connecting employee skillsets with the requirements of positions. Specifically, by storing data in the form of a knowledge graph, organizations are enabled to profile the competencies of their employees and optimize the deployment of human capital to the company's objectives. This study explores data provided by a large engineering organization which merges employee data, including project assignment and skills, with a public library of competency profiles from O*NET. The objective is to explore employee skills profiling, optimize project staffing, and identify employees best suited for upskilling through the use of graph databases and machine learning algorithms. The findings show that knowledge graphs present an opportunity for organizations to better understand their workforces and more optimally allocate and strengthen their human capital.
10:00
Hannah B Frederick, Haizhu Hong and Margaret A Williams (University of Virginia, USA); Amanda Christine West (UVa, USA); Brian Wright (University of Virginia, USA)
Our work aims to aid in the development of an open source data schema for educational interventions by implementing natural language processing (NLP) techniques on publications within What Works Clearinghouse (WWC) and the Education Resources Information Center (ERIC). A data schema demonstrates the relationships between individual elements of interest (in this case, research in education) and collectively documents elements in a data dictionary. To facilitate the creation of this educational data schema, we first run a two-topic latent Dirichlet allocation (LDA) model on the titles and abstracts of papers that met WWC standards without reservation against those of papers that did not, separated by math and reading subdomains. We find that the distributions of allocation to these two topics suggest structural differences between WWC and non-WWC literature. We then implement Term Frequency-Inverse Document Frequency (TF-IDF) scoring to study the vocabulary within WWC titles and abstracts and determine the most relevant unigrams and bigrams currently present in WWC. Finally, we utilize an LDA model again to cluster WWC titles and abstracts into topics, or sets of words, grouped by underlying semantic similarities. We find that 11 topics are the optimal number of subtopics in WWC with an average coherence score of 0.4096 among the 39 out of 50 models that returned 11 as the optimal number of topics. Based on the TF-IDF and LDA methods presented, we can begin to identify core themes of high-quality literature that will better inform the creation of a universal data schema within education research.
10:15
Michael Bassilios, Ava Jundanian, Josh Barnard, Vienna Donnelly, Rachel Kreitzer, Stephen Adams and William Scherer (University of Virginia, USA)
In the world of college sports, the process of recruiting players is one of the most important tasks a coach must tackle. With only 6% of the 8 million high school athletes earning spots on NCAA teams, finding and selecting the right players can be incredibly challenging even with the availability of widespread data. Some sports, like football and basketball, have found great success using predictive analytics to estimate success in college. These efforts, however, have not yet been extended to other sports, such as golf. Given the vast amount of data available to the public on junior golfers, there is clear potential to bring analytics to college golf recruiting. We partnered with GameForge, a leading golf analytics company, to create a recommendation tool for college coaches, one that leverages the already existing data on high school and collegiate golfers and a variety of predictive models to display athletes we believe would best fit in a certain college program. A systems analysis approach was taken to find the factors that most accurately predict a high school player's success in college golf. This was done with a variety of models including the forecasting of probability of a high school athlete being a top ranked college golfer, the finding of players with a similar performance to another desired player, and the predicting of a junior golfer's scoring performance and development during the remainder of their high school career and during college. Using these models, we identified several factors that are predictive of player similarity and performance. The research team iteratively developed these models to be used in conjunction with each other in order to provide meaningful, and understandable recommendations to a college coach on which players they should recruit to maximize success.
10:30
Michael Pajewski, Chirag A Kulkarni, Nikhil Daga and Ronak Rijhwani (University of Virginia, USA)
Over 600,000 people go missing each year in the United States. These events can cover situations anywhere from a young child going missing in a park to a group of hikers getting lost on a trail. dbS Productions has collected data on 16,863 searches over the past 30 years to generate an international database for use by search and rescue teams. The data recorded include a variety of fields such as subject category, terrain, sex, weight, and search hours. The data set is currently being underutilized by search and rescue teams due to a lack of applicable predictive tools built upon the aforementioned data. These search and rescue teams are also often volunteer-based and face great resource limitations in their operations. A tool is needed to predict the probability of a missing person's survival for the operation's coordinator to aid in resource allocation and the decision to continue or terminate search missions, which can be costly. This paper details an effort to create such a survivability predictor to help with this goal. We applied an Boosted Tree implementation of an Accelerated Failure Time (AFT) model to estimate the probability that a lost person would be found over time, given personal information about the subject, the location, and weather. We engineered several categorical variables and obtained weather data through the National Weather Service API to improve the model performance. Our engineered model recorded a C-index score of .67, which indicates a relatively robust model where industry standard considers 0.7 as "good" and 0.5 on par with random guessing. An analysis of the feature weights suggested that subject age, temperature, population density, mental fitness, and sex are the most critical indicators of survival in a missing person incident. Future work should involve incorporating more specific weather data, such as wind speeds and precipitation, into the model to improve prediction accuracy. Further research directions may include building a geo-spatial model to predict potential paths taken by a missing person based on initial location and the same predictors used in the survivability model.

#### Optimization, Simulation, and Decision Analysis 5

Room: Optimization, Simulation, & Decision Analysis
Chairs: Debajyoti (Debo) Datta (University of Virginia, USA), Mehrdad Fazli (University of Virginia, USA)
9:45
Keyu Chen, Benjamin C Cosgro, Oretha Domfeh and Alex Stern (University of Virginia, USA)
In this paper, we leverage non-survey data (i.e., news articles), natural language processing (NLP), and deep learning methods to detect and measure innovation, ultimately enriching innovation surveys. Our dataset is composed of 1.9M news articles published between 2013 and 2018 acquired from Dow Jones Data, News, and Analytics. We use Bidirectional Encoder Representation from Transformers (BERT), a neural network-based technique for NLP pre-training developed by Google. Our methods involve: (i) utilizing Google's BERT as a binary classifier to identify articles that mention innovation, (ii) developing BERT's named-entity recognition algorithm to extract company names from these articles, (iii) leveraging BERT's question and answering capabilities to extract company and product names. As a result, we obtain innovation indicators, i.e., company innovations in the pharmaceutical sector.
10:00
Elizabeth Korte, Courtney Laughlin, Thomas Peters, Lillian Stiles, Robert J Riggs, Kimberly Dowdell and Karen Measells (University of Virginia, USA)
In 2020, health systems have been affected by the novel coronavirus (COVID-19) pandemic, causing an influx of COVID-19 related visits and a sharp decline in non-emergency and elective visits. To mitigate the spread of COVID-19, healthcare systems - including the University of Virginia Health System - reduced ambulatory visits and implemented various social distancing measures, resulting in a drastic change in the patient admittance process. The focus of this work is to accurately characterize the effect of COVID-19 on one of the UVA Internal Medicine, Primary Care clinics, and where possible, to refine and optimize patient flow through the appointment process while accommodating public health restrictions. To achieve these goals, the team adopted a systems approach, which involves the iterative process of problem identification, analysis, and testing recommendations. The first phase of the project focused primarily on establishment of the current state and problem identification. The appointment process contains six major elements: scheduling, sign-in/remote registration, check-in, rooming, check-out, and telemedicine. Through extensive discussions with the clients, surveys of clinic staff, in-person observation, and data collection and analysis, the capstone team was able to understand the pandemic's impact on the clinic's patient flow and identify key problem areas at each stage in the appointment process. The team then used these insights to develop informed recommendations for these pain points. The second phase of the project consisted of formulating trials within UVA health restrictions and guidelines to test the impact of our recommendations. Through a pilot of a new remote registration process, on-time patients increased from 68% to 75%, nurse perceived workload decreased significantly, and the arrival process became more predictable. From this work, the team was able to develop a more generic framework for how health systems might assess and address patient flow issues under normal circumstances as well as during future pandemics.
10:15
Monica Uribe-Francisco, Olivia Hoerle, Joshua Groover and Olivia Zarroli (The George Washington University, USA)
The brewing process for beer production both utilizes and emits carbon dioxide (CO2). Instead of capturing and cleaning CO2 produced in the process, most microbreweries emit it into the atmosphere. Microbreweries have the potential to save money and reduce their carbon footprint by capturing, cleaning, and reusing CO2 from the fermentation process instead of purchasing it from an outside supplier. CO2 capture and cleaning systems are now commercially available but are costly. Given the financial drawback that microbreweries face, a decision support tool (DST) is developed with a dashboard that aims to provide a feasibility assessment for implementing a specific carbon capture and utilization system (CCUS). The dashboard has an ease of use and accessibility and provides relevant information for a wide range of variables: direct and indirect costs and benefits to perform assessments including life cycle cost-benefit analysis for the CCUS, sensitivity analysis, and more. The DST uses testing and validation through expert elicitation and simulation. Its design relies on system simulation software Vensim for the development of a back-end equation derivation. The front-end is hosted on a user-friendly dashboard. Further testing and validation can be conducted to further improve the frontend design and usability of the system.
10:30
Felipe Nedopetalski and Joslaine Cristina de Freitas (Universidade Federal de Jatai, Brazil)
Process mining can be understood as a tool to extract useful information from processes that already happened and make decisions to improve performance of processes. The main three techniques while applying process mining to event logs are process discovery, process enhancement and conformance checking. Among many different applications that process mining can be applied to, in this paper, process mining is used to discover the model from event logs generated from simulations of the "Handle Complaint Process" Workflow net based on a p-time Petri net model with hybrid resources. This net discovered with process mining must be similar to the original one due event logs used to generate it are created from the simulation. There is no doubt that process mining has become increasingly useful for the future of Workflow nets especially when it is followed by simulation. Process mining can discover processes from event logs, find deviations and produce a better workflow while simulation can test new scenarios and hypotheses. With both working together product owners can reach a pretty good process excellence. The "Handle Complaint Process" Workflow net based on a p-time Petri net model with hybrid resources tries to solve the real time scheduling problem of Workflow Management Systems. The approach made in this work in particular, utilizes discrete + continuous resources and real time to decide when to fire a transition in the Workflow net. To generate the event logs from the simulation of the Workflow net, some functions were added in order to capture the identification number of each token, the path made by it, as well as the timestamp in the moment the transition was fired and the person or system responsible for the activity. This Workflow net was simulated using CPN Tools. The logs generated from the simulation were converted using the ProM Import tool and the process mining discovery technique was applied using ProM. The use of event logs of a business process model is a way to detect deviations from the expected behavior. Based on these deviations, the process can be changed in order to achieve excellence. The logs from the p-net model with hybrid resources tries to simulate, in a better way, the human behavior. As the model generated from the logs is similar to the original one, the conversion is correct. As a future work proposal, we will compare a real event log with the results achieved with this work to see the efficiency in simulating a process model with hybrid resources.

#### Infrastructure, Networks, and Policy 5

Room: Infrastructure, Networks, & Policy Track
Chairs: Moeen Mostafavi (University of Virginia, USA), Samarth Singh (University of Virginia, USA)
9:45
Ryan Barnett, Christopher M Hume and Andrew Taylor (University of Virginia, USA)
There are too many vehicle to pedestrian related accidents going on today. Some time in the future autonomous vehicles will be implemented into society and with that comes many more threats to pedestrians. It is important that we find the best way for pedestrians to interact with vehicles with human and non-human drivers in order to keep them safe. As technology gets more and more advanced, it is important that we focus more on keeping our pedestrians safe when there are so many possibilities for lapses in connectivity and miscommunications between pedestrians and vehicles. The focus of our investigation is to determine the feasibility of crossing intersections reliant on pedestrian to vehicle connectivity. As vehicles increase in complexity with respect to their interaction with other vehicles, pedestrians, and the environment, a future concept may arise for an intersection which requires no physical infrastructure - autonomous vehicles may be built to recognize pedestrians and to stop accordingly; and signal timing may be a understood in the form of a dataset uploaded periodically. In the case of this concept's development, how susceptible may it be to something such as a signal loss? What may the impacts on traffic and pedestrian buildup be? Our work will join both a generalized research project with a simulated model of a simplistic intersection. The group has researched the levels of interconnectedness of autonomous vehicles and their connection to pedestrians, and will conduct research on potential downfalls (technological or natural) which may impede the success of a non-signalized autonomous intersection. The system in question will be designed as a one way, one lane street with multiple crossings. Vehicles will arrive at a rate of roughly 600 vph (vehicles per hour) and pedestrians at a rate of roughly 150 per hour. Vehicle and pedestrian arrivals will be randomized. In order to cross, pedestrians must "signal" to vehicles that they are in the crossing area (in the real world, this could be done on a mobile phone) so that the vehicles will stop for them. Intermittently, there will either be a "signal loss" in which pedestrians will no longer be able to signal to the vehicles that they are in the area, or a "connection loss" in which vehicles will not be able to receive signals from the passengers, forcing them to slow down as they will not be able to see where pedestrians are. The system will output queue timing and density values for the vehicles and pedestrians. We expect to gather data on average wait times and flow of pedestrians and vehicles over a simulated time frame of a month. This data will be benchmarked against that provided by the VDOT regarding pedestrian and vehicle flow through signalized intersections. We will also collect data on the short gap time (amount of time between cars that pedestrians feel safe to cross) and time to go across crosswalks for pedestrians to create parameters for maximum allowable signal down times and to explore potential fail safe procedures.
10:00
Grace Glaubit, Katie Kleeman, Noelle Law, Jeremiah Thomas, Shijie Gao, Rahul Peddi, Esen Yel and Nicola Bezzo (University of Virginia, USA)
Autonomous ground vehicles (UGVs) traversing paths in complex environments may have to adapt to changing terrain characteristics, including different friction, inclines, and obstacle configurations. In order to maintain safety, vehicles must make adjustments guided by runtime predictions of future velocities. To this end, we present a neural network-based framework for the proactive planning and control of an autonomous mobile robot navigating through different terrains. Using our approach, the mobile robot continually monitors the environment and the planned path ahead to accurately adjust its speed for successful navigation toward a desired goal. The target speed is selected by optimizing two criteria: (1) minimizing the rate of change between predicted and current vehicle speed and (2) maximizing the speed while staying within a safe distance from the desired path. Additionally, we introduce random noise into the network to model sensor uncertainty and reduce the risk of predicting unsafe speeds. We extensively tested and validated our framework on realistic simulations in Gazebo/ROS with a UGV navigating cluttered environments with different terrain frictions and slopes.
10:15
Luther A Bell (St. John'sUniversity, USA); Puya Ghazizadeh (St. John's University, USA); Samy S. El-Tawab (James Madison University, USA); Aida Ghazizadeh (Old Dominion University, USA)
Vehicular Clouds are inherited from the cloud computing concept. Vehicles standing in a parking lot can corporate computing, sensing, communication, and physical resources. Vehicular Clouds were motivated by the realization that present-day vehicles are equipped with powerful onboard computers, powerful transceivers, and an impressive array of sensing devices. As it turns out, most of the time, the computing, storage, and communication resources available in our vehicles are chronically under-utilized. We are putting these resources to work in a meaningful way to provide computation power, which plays an essential role for service providers, transportation systems, health care, and online education in our modern society. Vehicular Clouds provide computation power to users based on a resource-sharing model. In this model, vehicle owners rent out their onboard computation powers to receive incentives in the form of payments or free parking spots. To use this computation power, there should be a way to submit jobs to the system. In this work, we develop a framework for the vehicular cloud to manage the onboard computation resource of the vehicles and computation tasks that users submit. This framework will be available to users in a software system called Vehicular Cloud Real-Time System (VCRTS). Random arrival and departure of vehicles in vehicular clouds can impact the computation nodes' availability and lead to an interruption in the computation process. We design and implement the VCRTS based on a fault-tolerant approach to prevent interruption in job execution. Our approach uses a redundancy mechanism to handle the random nature of arrival and departure of the vehicles that are used as computation nodes.
10:30
Daoyi Li, Yuzhao Qiang and John H. Mott (Purdue University, USA)
The mountainous landscape in western China provides cargo delivery unmanned aerial vehicles (UAVs) with a potentially enormous market, and Chinese logistics companies are developing and testing prototypes of such UAVs. Some prototypes, for example, the Feihong-98 by SF-express, may enter service in 2021. Despite the rapid development of heavy UAVs, the construction of associated infrastructures and the formulation of corresponding laws and regulations are disturbingly backlogged: lack of airports in western China limits the operation of the UAVs, and there are only basic regulations regarding operation and maintenance as of 2019. We analyzed China's current air traffic control (ATC) system, relevant regulations, conditions of general aviation (GA) airports, automatic dependent surveillance-broadcast (ADS-B) systems, and cellular networks, and found existing problems in the systems, including inadequate airspace classification and workload distribution. We also conducted failure mode and effect analysis (FMEA) over UAVs and control stations to better analyze the problems. Based on the information we obtained and on China's social and political conditions, we explored solutions that provide a preliminary outlook for a new ATC system targeting heavy delivery UAVs. Such solutions include reassigning air traffic control duties and applying 5G cellular technologies in air traffic surveillance and management.

### Friday, April 30 10:55 - 11:55 (America/New_York)

#### Workshop: Machine Learning Introduction for Newbies

Room: Main Zoom Room
Chair: Bethany Brinkman (Sweet Briar College, USA)
10:55 Machine Learning Introduction for Newbies
Wenqiang Chen and John Stankovic (University of Virginia, USA)
This workshop is designed for newbies who are totally new to Machine Learning (ML), but are interested in incorporating ML into their projects. It presents a high-level overview and the primitive ideas of Machine Learning/ Artificial Intelligence. Also, it covers an understandable and straightforward introduction to similarity distance, K-Nearest Neighbors(KNN), Support Vector Machine (SVM), Decision Trees, and Random Forests. Through these basics, this workshop demystifies some novel ML applications and leads to more advanced ML algorithms. Last but not least, some ML tools and platforms are introduced for newbies to start getting their feet wet. In short, the overall goal of this workshop is helping the beginner, who know very little or nothing about machine learning, to apply ML into their projects.

### Friday, April 30 12:00 - 12:30 (America/New_York)

#### Awards and Closing Ceremony

Awards
Room: Main Zoom Room

Come to hear who won the Best Paper Awards for each track!