Name: Graph For All Million Dollar Challenge
Start: 2022-02-09T08:15:00.000-05:00
End: 2022-04-20T23:45:00.000-04:00
Location: Graph For All Million Dollar Challenge

Graph For Better X

Identify a problem you want to tackle with graph analytics and invite others to follow.
X: Design Your Problem Statement

Graph For Better Earth

Improve living conditions on Earth by focusing on sustainability
1: Monitor Impact Of Climate Warming In The Arctic

Graph For Better Finance

Help people or organizations prosper
2: Predict Global Crises

Graph For Better Health

Find ways to enable people to live healthier lives
3: Predict Next Wave Of COVID
4: Find Novel Drug Treatments
5: Detect Early COVID-19 Mutated Variants

Graph For Better Learning

Discover innovations that allow people to learn quicker and more impactfully
6: Foster Critical Thinking
7: Reduce The Noise Of News Search

Graph For Better Living

Make life more enjoyable for humanity
8: Manage Your Personal Identity

Graph For Better Systems

Increase effeciencies across global processes
9: Develop Effective Public Transportation System

Graph For Better World

Tackle social issues to make the world a better place to live
10: Enable Search For United Nations Sustainable Development Goals
11: Create STEM Opportunities for Women
12: Find Ethically-Sourced Goods

Graph For Better X

Identify a problem you want to tackle with graph analytics and invite others to follow.

X: Design Your Problem Statement

Author: You

The mission of the Graph For All Million Dollar Challenge is to ignite people’s passion and innovativeness from around the world to solve real problems using graph analytics. For that reason, we are inviting you to design your own problem statement that you want to solve.

If you select this as an option, you will be asked to describe the problem you are trying to overcome as part of your submission.

Your problem statement should cover the following:

Identify the problem you are trying to solve
Two to three sentence explanation of why the problem is important and who it affects. Quick description on what’s currently being done and why it is not enough. Also include the worst case scenario description if the problem continues to go unchecked.

What is the challenge?
Talk about what’s been done: X has failed in the past, but now we have graph technology to take another shot at things. Outline a grand vision for the solution, what would be your optimal outcome.

Resources

Review other problem statements submitted by domain experts for inspiration and guidance
Request a Problem Statement Consultation by clicking here.

Graph For Better Earth

Improve living conditions on Earth by focusing on sustainability

1: Monitor Impact Of Climate Warming In The Arctic

Author: Alexey Portnov, PhD Research Associate at University of Texas at Austin - Institute for Geophysics

The Problem

Anthropogenic climate change has accelerated over the past several decades, and affects people’s well being through increasing catastrophic weather events, rising sea levels, and drought. Particularly vulnerable are the polar regions where increasing greenhouse gas concentrations result in mean annual temperature increase that is 2-3 times higher compared to the rest of the world. This effect is called "Arctic amplification" and it directly harms nature and wildlife through sea ice melting and permafrost thawing. It is also a significant risk factor for the indigenous communities and engineering infrastructure of the Polar regions.

One of the critical consequences of the global temperature rise is thawing Arctic permafrost. Permafrost regions are extensive (thousands of kilometres wide), and they conserve tremendous amounts of organic carbon, which releases into the atmosphere in the form of greenhouse gas methane if permafrost disintegrates. Such a process has recently intensified, and we observe it, for example, through the appearance of explosive gas blow-out craters in the Canadian, US and Russian Arctic. It is a rapid and hazardous process. For example, a flat (or slightly doming) earth surface turns into a deep and wide crater in a matter of hours and over the next year will fill with water and become a thermokarst lake.

The Challenge
Monitoring the emergence of thermokarst lakes is important for understanding the impact of climate warming and assessing the geo-hazard risks in the polar regions. Such craters and lakes (that can be tens to thousands of metres wide) are well seen on the satellite images. The challenge is to use available public satellite databases over the last 20-30 years (depending on the image quality and availability) to capture the differences in images before and after lakes appear and produce the time-series for newly generated lakes. The thermokarst lake locations and time of origin should be marked/catalogued. This will allow monitoring their dynamics and further feed it into various climate models. It makes sense to use mostly summer months. Potential geographic regions are Yamal peninsula, Tuktoyaktuk peninsula or any other Arctic regions with available satellite data.

Resources

Helpful reading and images on the announced topic:
- https://www.geoexpro.com/articles/2015/12/gas-blowouts-on-the-yamal-and-gydan-peninsulas
- http://bbc.com/future/article/20201130-climate-change-the-mystery-of-siberias-explosive-craters
Potential satellite databases:
- https://gisgeography.com/free-satellite-imagery-data-list/

Graph For Better Finance

Help people or organizations prosper

2: Predict Global Crises

Domain Expert: Haris Dindo, Chief Technology Officer at SHS Asset Management

The Problem
Throughout history, we have seen how one country’s decisions and behaviors can affect others, if not the world. The most recent example is the crisis of 2007/08, when the US housing market collapsed and triggered the whole world to go into a crisis. Another - more recent - example is that of the insurgence of COVID-19 pandemic. Besides these, there are myriad other disruptive crises among world countries. Since history tends to repeat itself, one cannot help but wonder if one could minimize the effects of another country’s crisis on their own population.

The Challenge
Given that all countries are connected to each other, construct a graph to see how much one country’s crisis will affect the other. Treat countries as nodes and their relationships as links. Leverage the different socio-economic and macroeconomic aspects that were captured for each country throughout time in order to predict a crisis, or, to predict what other countries will be affected by one country’s crisis. This would help countries minimize the effects of another country’s crisis on their own citizens.

Resources
Following is a list of data resources that can be used to address the challenge. But there is definitely much more: from specific macroeconomic indicators to perceived sentiment around countries and their economies. Be creative!

Foreign Trades between countries (global and per sector): https://wits.worldbank.org/Default.aspx?lang=en
Various global indicators: https://data.worldbank.org/indicator/
Global crises data: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx
Macroeconomic indicators for each country: Macroeconomic data

Graph For Better Health

Find ways to enable people to live healthier lives

3: Predict Next Wave of COVID

Author: David DeCaprio, Founder and CTO of ClosedLoop AI

The Problem
COVID response efforts focus on using the wide array of publicly available data on COVID transmission and spread to help understand virus transmission and spread, with the hope of being able to predict and mitigate future infection spikes.

COVID infections have occured in several waves of rapidly increasing and then declining infections, with hospitalizations and deaths lagging infections by predictable lags. Many explanations have been proposed over the last two years to explain these waves, including new variants, weather, vaccinations, public health measures, and changes in individual decision making and risk tolerance. All appear to play a factor, but the rise and fall of cases over time isn’t fully explained by any of these alone nor do we have good insights into when the next wave will come, how long it will last, or how severe it will be. Several insights have come from analyzing the progression of these waves within different countries and within different regions of a country (counties or zip codes within the US for example).

The Challenge
Develop a solution to analyze prior COVID waves and model the progression of new infections. This approach could inform policy makers to more proactively institute restrictions ahead of impending waves, and to remove restrictions that are unrelated to the caseload increases or have simply outlived their usefulness. The solution could also be used by healthcare organizations to plan for capacity surges by rescheduling elective procedures, and by any organization planning a large gathering to have more insight into COVID-related adjustments to their plans.

Resources

Use the many publicly available COVID resources to explore connections between these various factors and analyze how these waves have progressed, both geographically and over time.
Read up on other approaches to this problem: https://www.news-medical.net/news/20210629/Can-COVID-19-waves-be-predicted-us ing-early-warning-signal-indicators.aspx

Dataset Example Resources

Johns Hopkins COVID data - https://github.com/CSSEGISandData/COVID-19 Up to date, aggregated data on worldwide infections.
CORD-19 - https://www.semanticscholar.org/cord19 Computer readable scientific papers on COVID-19

4: Find Novel Drug Treatments

Author: David DeCaprio, Founder and CTO of ClosedLoop AI

The Problem
Novel treatments focus on looking at healthcare data to identify and learn from the “natural experiments” going on within the healthcare system as doctors and patients seek to find novel ways to address their illnesses.

Drugs and other medical treatments are approved in carefully controlled clinical trials that are designed to answer very narrow questions about the safety and efficacy of those treatments for very specific conditions and patient groups. However, treatments are often effective for a wider range of uses than what they are officially approved for. Such “off label” usage is common. These treatments could be effective for patients, but often aren’t as well studied once the drugs are out in the market.

The US Centers for Medicare and Medicaid Services publishes a lot of public use data about healthcare in the United States. These data sets contain information about drugs prescribed and conditions treated. The data is not individually identifiable, but can be related through providers and facilities. This data could be linked to understand potential off label uses of existing treatments by identifying cases where drugs are being prescribed by providers or at facilities that don’t treat the primary use. Using this approach could allow us to generate new medical knowledge from the full healthcare system and not just from clinical trials.

The Challenge
Develop a solution to combine various healthcare data sources to understand patterns of diagnosis and treatment that point to potential off label usage of drugs. An example is clustering doctors with the drugs they typically prescribe and the diseases they typically treat, and then comparing those results to known databases of the diseases associated with particular drugs. Outliers in this space provide potential cues to off label usage. This could be used to identify interactions missing from those databases that could be useful to doctors, patients, and researchers.

Resources
Dataset Example Resources:

CMS Public Use Files -
- https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/medicare-geographic-variation
US Prescriber Data -
- https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/medicare-provider-cost-report
- https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/medicare-provider-charge-data
CDC Clinical Trial Results online - https://clinicaltrials.gov/ct2/results
Dartmouth Health Atlas - https://www.dartmouthatlas.org/

5: Detect Early COVID-19 Mutated Variants

Authort: Dr. Chun-Kit Ngan, Assistant Teaching Professor of Data Science at Worcester Polytechnic Institute

The Problem
Severe and various COVID-19 mutated variants have been evolving over time that have threatened the lives of the public community.

Motivational Background
Coronaviruses are a family of single-stranded RNA viruses that can transmit infections between humans and have been documented for over 50 years. Coronavirus infectious disease 2019 (COVID-19) was initially reported as the Wuhan Coronavirus or as the 2019 novel coronavirus since December, 2019. Since then, COVID-19 cases continue to increase with several different mutated variants, e.g., Theta (lineage P.3), Alpha (lineage B.1.1.7 with E484K), Delta (lineage B.1.617.2), Omicron (lineage B.1.1.529), to name a few. People with COVID-19 and its mutated variants may have a wide range of symptoms, e.g., trouble breathing, persistent pain in the chest, and inability to stay awake. These complications can range from mild symptoms to severe illness that can ultimately result in death. Thus, developing an effective approach to detect the emergence of these mutated variants in advance is needed so that medical therapists and researchers can intervene early, take the appropriate actions promptly regarding treatments and vaccines to mitigate adverse effects on the public.

The Challenge
One potential and feasible approach that can early detect the emergence of these COVID-19 mutated variants is to monitor and observe the abnormalities in multiple symptoms’ resolutions over time. Currently, there are three conventional approaches that can perform time series anomaly detection: (1) Predictive Confidence Level, (2) Statistical Profiling, and (3) Clustering-Based Unsupervised Approach. However, some of these approaches (e.g., Auto Regressive Moving Average and Auto-Regressive Integrated Moving Average) can only detect anomalies in a single time series without considering the impacts brought by other mutual-related time series. Even if some other approaches (e.g., Vector Auto-Regression Moving Average, Vector Auto-Regressive Integrated Moving Average, and Vector Error Correction Model) can detect anomalies over multivariate time series, those approaches cannot capture the interrelation among those time series to perform a better and precise detection. Thus, we need a better approach that can achieve this purpose, and we believe the graph-based approach can succeed in this problem.

Objective
The purpose of this project is to develop and advance a graph-based multivariate time series anomaly detection, based upon the state-of-the-art graph-neural-network architectures, to detect the emergence of COVID-19 mutated variants earlier than currently possible.

Resources

Dataset Resource: https://github.com/echen102/COVID-19-TweetIDs
Technology Resources: https://docs.tigergraph.com/home/

Graph For Better Learning

Discover innovations that allow people to learn more quickly and more impactfully

6: Foster Critical Thinking

Author: María Laura García, President and founder of GlobalNews® Group

The Problem
Human thought is prone to errors. In recent times, we have gotten better at understanding why those errors occur and how to utilize and exploit them. Modern advances in technology like Artificial Intelligence, big data, cloud computing, and blockchain can all be used to manipulate our cognitive biases in order to sell us products and services or make us value certain information more than others. One of the most notable of these cases is the utilization of confirmation bias by social media companies to create “filter bubbles” of opinions that a person agrees with. This not only prevents us from being well-informed, but ultimately leads to a lack of critical thinking and adds to polarization. If we only listen to and validate one way of thinking, and do not expand and question it, sooner or later we will become intolerant citizens. And when there is no more tolerance, the very foundations of our democratic coexistence begin to crack.

The Challenge
In a context in which humanity is facing major challenges and democracies are being questioned throughout the world, emerging technologies can represent either a threat or an opportunity. In this case, the key is to identify a way to use these technologies to break through the confirmation bias and the filter bubbles enhanced by social media platforms. The aim is to foster critical thinking in digitally literate citizens who access and critically relate to different types of information, and who increasingly engage in dialogue with other points of view and with those who think differently. The goal is not to come up with an alternative business model for digital platforms or to eliminate the confirmation bias of the human mind, which would be almost impossible. Rather, it is to empower citizen-users with the necessary tools to be able to identify the presence of these biases and to consciously seek diverse perspectives and opinions on the same topic.

In order to understand how confirmation bias is affecting a given user on social media, a possible approach would be to have a model that analyzes clusters that publish and repost similar articles on social media (specially using public social media for example from Twitter: Twitter API Documentation | Docs | Twitter Developer Platform) and then, given the user’s account, find which media sources are more prone to be on his community (up to 2nd level connections). Then, analyzing the similarity or using topic analysis of articles published on a news api (for example https://newscatcherapi.com/ or https://newsapi.org/ ), suggest articles that have a similarity with those the user is seeing on his feed (so as to maintain interest) from other sources that are not usually present on his community.

Resources
Books and articles

Garton Ash, T. (2017). Free Speech: Ten Principles for a Connected World.
Pariser, E. (2017). The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think.
Pariser: https://medium.com/@10797952/the-causes-and-effects-of-filter-bubbles-and-how-to-break-free-df6c5cbf919f
Zuboff, S. (2018). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power.
Friedman-Wheeler, D. (2020). Democracy requires us to work on our biases – all of them. The Fulcrum. Available in: https://thefulcrum.us/amp/democracy-requires-us-to-work-on-our-biases-all-of-them-2645473111
Lu, M. (2020). 11 Cognitive Biases That Influence Political Outcomes. Visual Capitalist. Available in: https://www.visualcapitalist.com/11-cognitive-biases-influence-politics/

Possible data sources

Research institutes

AI Now Institute
Global Freedom of Expression Columbia lab
Reuters Institute for the Study of Journalism

7: Reduce The Noise Of News Search

Author: Ashleigh Faith, Director, Knowledge Graph and Semantic Search at EBSCO

Problem Statement
Google, while a wonderful resource for quick-fix questions, starts to repeat search results after the third page, this is especially true for news articles. A big reason for this is duplicate resources from common sister agencies like Associate Press and all the newspapers that use its articles, as well as reshares or reposts, artificially inflate the volume of an article/post and its importance. Re-posts or re-shares often are also changed slightly so Google does not see them as duplicates. This causes inflated importance of some posts (going “viral” unnecessarily) and gives a noisy Google search experience that may be hiding more relevant news articles from end-users.

The Challenge
How can news articles with the same content be identified and associated with each other in order to prevent inflation of information importance? Take cues from copyright detection or song recognition as you design your solution. Attempt to identify duplicate news articles that you might scrape from Google or internet search results and what sources those articles commonly come from. How can this information be used to better enable the public to make sure they're getting the most important and diverse information?

Possible Approach

The desired state would likely be to have a hyper-node graph (https://aclanthology.org/2020.emnlp-main.596.pdf) that represents the common metadata for a cluster of duplicate or near-duplicate articles/posts and how their metadata relates to one another (with similarity score so some data science would be needed here). The individual articles and their metadata would be clustered together as relations to the hyper-node. Each hyper-node would in effect represent all versions of the individual articles and posts that are duplicates and give a normalized representation of the article/post. This hyper-node and its metadata can then be used to group articles/posts together in a search application to minimize noisy search for news articles/posts and help end-users identify if an article/post is actually “going viral” or just overhyped and not worth their time.

To scope this solution, take 30-50 news articles and posts (even distribution if possible) and create a hypergraph of as many duplicate or near duplicate articles you can find (you should use the metadata to determine similarity of a duplicate). Document the metadata for each article/post in your dataset, assess the metadata for duplicate information to create a similarity score, the most similar articles/posts will create the cluster of articles/posts related to each hyper-node. You decide the threshold for similarity but 75% (0.75 f-score) similarity on metadata fields is the lowest recommendation that is likely to produce good results. Make sure to document the normalized information (the data the clustered articles/posts have in common) as metadata for the hyper-node and the similarity between each hyper node. Representing the similarity of metadata between hyper-nodes will allow for the solution to scale so that as new news articles/posts are posted, the metadata can be queried to identify if the new article/post is an existing duplicate, or a new article/post.

The desired state would have two outputs: the first is the model and its populated hyper-graph and the second is the similarity model, likely a machine learning model. The hyper-graph can be scoped to have 30-50 hyper-nodes, with at least 2 duplicate or near duplicate articles associated with each hyper-node (so a total dataset of 60-100 individual articles/posts and their metadata). Each hyper-node will have the normalized metadata of the articles it represents, as well as the similarity score for the individual articles to one another and the similarity score between each hyper-node. The machine learning model should be open-source on Github and be flexible enough to be pointed at any news dataset that has standard metadata such as Google News or social media news feeds like Twitter, and allow for the similarity threshold to be modified.

This solution should be able to be used to 1.) identify duplicates in a static news dataset in the graph, 2.) identify if a new article is a duplicate of an existing article in the graph, 3.) enable others to use the similarity model on news datasets, and 4.) allow for a search engine to traverse the graph and retrieve the hyper-node (and the articles/posts it relates to) for retrieval and display, similar to how Google Scholar represents similar academic articles.

Resources
Dataset Example Resources:

Google or general news API data can be found here https://newsapi.org/ and https://www.aakashweb.com/articles/google-news-rss-feed-url/.
And the Twitter API can be used to gather social media news posts here [https://developer.twitter.com/en/products/twitter-api.]
Examples for article metadata and how these can be represented as a graph, not specifically news, can be found here: https://www.connectedpapers.com/
An example for how Google clusters academic articles together can be found here https://scholar.google.com/scholar?cluster=2298612016001752644&hl=en&as_sdt=0,22.
Standards on news metadata can be found here: https://iptc.org/standards/
Help with mining Twitter feeds with python is located here: https://www.toptal.com/python/twitter-data-mining-using-python
Similarity data science in Tiger graph resources: https://info.tigergraph.com/hubfs/Graph%20+%20AI%20World/Graph-AI-FPGA.pptx.pdf and https://www.tigergraph.com.cn/wp-content/uploads/2021/04/EN_OReilly-book-Grah-powered-Analytics-and-Machine-Learning-with-TIgerGraph-Early-release-April-2021.pdf

Graph For Better Living

Make life more enjoyable for humanity

8: Manage Your Personal Identity

Author: Ashleigh Faith., Director, Knowledge Graph and Semantic Search at EBSCO

The Problem
More data is generated per day by one person than a typical space mission generates. With our data being used in ways we are not expecting or are potentially unethical, more citizens want to learn how to better track and manage their data. The concept of personal knowledge graphs has risen but these are still focused on the information someone wants to keep, not their personal data and how companies are using it. With fraud and other malicious activity running rampant, having a simple way to see the network of where your information has been shared, what information was shared, and when, will help citizen scientists track their data to ask to be forgotten, track risks to their information, and also help identify where a breach may have occurred.

The Challenge

Generating a model that an individual can use without having graph experience can allow for individuals to track their own information and make better decisions on who has their data, and what is being done with it. This model would include the most common data generated by an individual such as nodes for personal information like birth date, unique identifiers like social security number, health records, bank information, etc., how each piece of information is used by businesses, services, and institutions, and when the information was shared or updated. The specific information will not be entered for security reasons, but the user would know where their birth date was shared, who it was shared with, and when, all in an easy to use no-code data entry with a graphical visual to help users track their data on their own.

Imagine the scenario where your mom needs to track down all the places her bank account and routing information is stored, perhaps she knows now that checks are not all that secure and wants to protect herself. This solution should enable her to not only enter where this information is stored across her network of information, but also should allow her to find the specific institutions or types of institutions where her specific account and routing number are currently. Because of GDPR, your mom can now ask these companies to forget the sensitive information.

In this scenarios, the basic nodes in someone's day-to-day life like companies, phone numbers, emails, shared sync accounts, people, and more (with the option for each node to have specific metadata such as label type, label name like Bank of America or CVS, value such as $10), and a set list of relations like purchased on, added on, added by, started service, and a general relation so the list is not too prescriptive (remember, this is for laypeople who usually don’t understand relations between nodes).

Resources

This solution is a personal information knowledge graph, NOT to be confused with a personal knowledge graph (https://www.strategicstructures.com/?p=2246) which is very similar, but is more focused on the “stuff” or assets like documents of a person, not their personal information tracking, which is what the proposed solution here is aimed at.
Similar to corporate big data social (https://www.tigergraph.com/blog/exploring-reddit-marketing-networks-with-graph-databases/ or https://blog.kgbase.com/social-networks-in-a-knowledge-graph/ ) banking, and insurance ( https://www.tigergraph.com/solutions/financial-services/ or https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/risk/deloitte-nl-risk-knowledge-graphs-financial-services.pdf) knowledge graphs, this solution would be modeled similarly but with the major difference of being layperson friendly from a data entry and query perspective -think if your grandma can use it, you have succeeded.

Dataset Example Resources:

Example data would be data you would use in your own personal life. A good test of your solution would be to use your own data (remember not your specific banking or SSN, just high-level data types and entity names).
Example models are listed above in resources.

Graph For Better Systems

Increase efficiencies across global processes

9: Develop Effective Public Transportation Systems

Author: Usha Rengaraju, Principal Data Scientist at MUST Research

The Problem
Public transport is the primary mode of travel in many countries and an efficient and well planned public transportation network can save time for commuters. Public transportation can be the most eco-friendly option for commuting and hence it is important to make the journey of commuters as optimal as possible. Graphs are used for modeling and navigating complex network systems like public transportation.

The Challenge

Develop an effective public transportation system that is easier for commuters to navigate.

Traffic data modeling has a wide spectrum of applications like alleviating traffic congestion, making better travel decisions, and improving the quality of travel for end customers. Traffic flow prediction, anomaly events of accident detection can be studied by traffic data modeling. Complexity and nonlinear spatio-temporal correlations coupled with the highly dynamic nature of road networks makes it challenging to model traffic data. Inherent graph structure of traffic networks opens doors to many graph based deep learning models which have achieved significant results in comparison with other traffic data modeling approaches. Many of the current graph approaches are unable to consider multiple passenger characteristics like total travel time, minimum number of transfers between stations and total distance of travel etc. The participants should come up with novel approaches to build a public transportation system that considers several parameters like weather, nearby events, distance, waiting time, travel time, and number of transfers.

Resources

Dataset Example Resources:

Graph For Better World

Tackle social issues to make the world a better place to live

10: Enable Search For United Nations Sustainable Development Goals

Author: Ellie Young, Founder at Common Action

The Problem

Over the course of the pandemic, the world has broadly converged on the need to increase action across environmental and social sustainability goals. The UN Sustainable Development Goals (SDGs), developed between thousands of global community representatives together with the United Nations, represent the authoritative set of sustainability challenges for humanity to achieve by the end of this decade. Covering interconnected social, environmental, and economic targets, the SDGs represent urgent, life-saving action for billions of people worldwide.

Because the SDGs span interconnected systems–such as climate change, biodiversity, and poverty–impacts in one target area are often related to dynamic phenomena in other areas. This presents both a challenge and an opportunity for development efforts: although the interconnected nature of these challenges makes for complex implementation conditions, there also exists the opportunity to combine synergies between dynamic project efforts to address multiple goals simultaneously. For example, in impoverished rural areas, school systems can be set up to address education and encourage female empowerment simultaneously–creating impact across three SDG goals.

Thus, a knowledge graph system has the potential to enable collective intelligence and coordination between actors, maximize resources, and ultimately, increase mpact.

However, although the interconnections between target areas are studied by scientists and international experts, no complete graphical or visual representation yet exists. Further, domain experts typically publish their findings in scientific papers and institutional reports. Therefore, the causal links between these elements are represented primarily in unstructured text, which remains distributed across a vast set of publications; for instance, the World Bank Open Knowledge Repository contains over 33,000 reports, and forms just one portal of development reports.

The Challenge
To unlock this vast information resource and support swifter and more impactful worldwide SDG action, participants are challenged to expose the rich contextual data published in reports from various development publication portals, such as the World Bank, United Nations, International Finance Corporation, and similar. The winning solution will identify granular concepts and goals related to each of the 17 SDG goals, and ideally how those concepts are interlinked with each other. For example, what topics are mentioned in the IPCC climate change report, that are also discussed in the Millennium Ecosystem Assessment? These may also be enriched with additional data sources, such as open datasets. The application should be user-friendly, and easy for a non-technical audience to navigate and understand.

The Sustainable Development Goals
#1 No Poverty
#2 Zero Hunger
#3 Good Health and Well-Being
#4 Quality Education
#5 Gender Equality
#6 Clean Water and Sanitation
#7 Affordable and Clean Energy
#8 Decent Work and Economic Growth
#9 Industry Innovation and Infrastructure
#10 Reduced Inequalities
#11 Sustainable Cities and Communities
#12 Responsible Consumption and Production
#13 Climate Action
#14 Life Below Water
#15 Life on Land
#16 Peace, Justice and Strong Institutions (violence, corruption etc)
#17 Partnerships for the goals

Resources
Here is a starting list of recommended reports and data to begin:

Millennium Ecosystem Assessment - current global report
Millennium Ecosystem Assessment - subglobal report
The 2030 Agenda for Sustainable Development
Global Assessment Report on Biodiversity and Ecosystem Services
Intergovernmental Panel on Climate Change 2021 report
The Sustainable Development Goals Report 2021
International Union for Conservation of Nature Reports
IUCN Ecosystems Taxonomy
UN SDGS data platform
30,000 reports at the World Bank Open Knowledge Repository

Check out available datasets with our partner, Data.World, for inspiration and ideas.

11: Create STEM Opportunities for Women

Author: McKenzie Steenson, Computer Science Undergraduate Student at Boise State University | Developer Advocate Intern at TigerGraph

The Problem

“Women make up only 28% of the workforce in science, technology, engineering and math (STEM), and men vastly outnumber women majoring in most STEM fields in college. The gender gaps are particularly high in some of the fastest-growing and highest-paid jobs of the future, like computer science and engineering.” (AAUW)

Women in STEM face equal economic and educational challenges that may contribute to the low number of women continuing on in STEM fields after college, “38% of women who major in computers work in computer fields, and only 24% of those who majored in engineering work in the engineering field.” (Pew Research) Until the differences in these statistics can be closed, companies and communities must come together to support women to succeed in math and science. The American Association of University Women outlines four keys to closing the STEM Gap, yet dynamic and global solutions have yet to seem to exist. Giving women opportunities to advance their careers and connect to other strong women, especially in STEM careers, can help narrow the pay gap which will in turn enhance women’s economic security. It will also continue to develop a diverse and successful STEM workforce to prevent bias in products and services created, which benefits all.

The Challenge
Here is a sample application, STEM Women, that highlights connecting women in STEM. The tool, a simple searchable database, was created to combat the lack of representation of women in STEM, but is only focused in Australia and not globally. Graph can be utilized to build a database of many people, places, and things and connect them all together by their relationships, and the power of graph technology can take those relationships global. For example, TigerGraph’s powerful machine learning applications for recommendation systems can drive women to the right job postings/openings, networking opportunities, diversity/inclusion events, mentorships, etc. to women in STEM and other related fields. The winning solution will utilize graph to help create more economic opportunities for women pursuing or currently in STEM careers using the power of graphs. Technology driven solutions in the technology driven field of STEM will continue the pursuit of closing the STEM gap, providing support and opportunity for all.

Resources
Use your imagination and unique data resources to capture consumer sentiments and relationships between their purchasing behavior and preferences.

Dataset Example Resources:

12: Find Ethically-Sourced Goods

Author: Daniel Barkus, Developer Advocate at TigerGraph

The Problem
Consumers are more and more aware of the “process” that goes into producing their goods. These processes are not always the most aligned with good ethics and are often built on exploiting labor in less wealthy regions of the world. There are various organizations attempting to keep track of these labor and human rights violations as well as the companies that profit from them, but that data is scattered and companies often hide behind shell corporations or private investments to attempt to save their name from being associated with the atrocities that they exploit. Until there is customer awareness and brand accountability to force these companies to change, then these exploitations will continue to run rampant, shortening the lifespan of workers in developing countries and poisoning the land that those people survive off of.

The Challenge
Corporate investments, LLC filings, parent company information, and part sourcing are all publicly available information. By combining the many different financial, hierarchical, and social rights data, it is possible not only to show which companies are directly supporting these rights violations, but the products that are a direct result of them. Consumers should know exactly what exploitations go into the products that they consume and how the production and consumption of those products has a direct impact on the regions that produce them.

Resources

Dataset Example Resources: