Century Communities Provides COVID-19 Resources For Real Estate Agents. National homebuilder now offering remote client registration, virtual tours and more.

Justin Ng – A Human BeingKaggle competition dataset coronavirus

COVID-19 Open Research Dataset Challenge (CORD-19)

COVID-19 Open Research Dataset Challenge (CORD-19)

An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House

Allen Institute For AI

and 8 collaborators

updated 4 days ago (Version 3)

Data

Tasks(10)

Kernels(73)

Discussion(82)

Activity

Metadata

Usability9.4

License

Other (specified in description)

Tags

business

,

natural and physical sciences

,

computer science

,

health

,

biology

and 3 more

Description

Dataset Description

In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action

We are issuing a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

A list of our initial key questions can be found under the Tasks section of this dataset. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organization’s R&D Blueprint for COVID-19.

Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.

Prizes

Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria. The winner may elect to receive this award as a charitable donation to COVID-19 relief/research efforts or as a monetary payment. More details on the prizes and timeline can be found on the discussion post.

Accessing the Dataset

We have made this dataset available on Kaggle, and are periodically updating it from its source. To learn more and access the latest copy of the dataset, you can also go here: CORD-19 | Semantic Scholar.

The licenses for each dataset can be found in the all _ sources _ metadata csv file.

Acknowledgements

This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine – National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.

Data (2 GB)

Data Sources

2020-03-13

all_sources_metadata_2020-03-13.csv

14 columns

json_schema.txt

all_sources_metadata_2020-03-13.readme

biorxiv_medrxiv

biorxiv_medrxiv

0015023cc06b5362d332b3baf348d11567ca2fbb.json

004f0f8bb66cf446678dc13cf2701feec4f36d76.json

00d16927588fb04d4be0e6b269fc02f0d3c2aa7b.json

013d9d1cba8a54d5d3718c229b812d7cf91b6c89.json

01d162d7fae6aaba8e6e60e563ef4c2fca7b0e18.json

01e3b313e78a352593be2ff64927192af66619b5.json

02201e4601ab0eb70b6c26480cf2bfeae2625193.json

0255ea4b2f26a51a3bfa3bd8f3e1978c82c976d5.json

029c1c588047f1d612a219ee15494d2d19ff7439.json

03ce432f27c7df6af22b92245a614db2ecb5de5f.json

793 more

comm_use_subset

comm_use_subset

000b7d1517ceebb34e1e3e817695b6de03e2fa78.json

00142f93c18b07350be89e96372d240372437ed9.json

0022796bb2112abd2e6423ba2d57751db06049fb.json

00326efcca0852dc6e39dc6b7786267e1bc4f194.json

00352a58c8766861effed18a4b079d1683fec2ec.json

0043d044273b8eb1585d3a66061e9b4e03edc062.json

0049ba8861864506e1e8559e7815f4de8b03dbed.json

00623bf2715e25d3acacb3f210d6888ed840e3cb.json

0072159e1ebecc889e9bcabb58bb45c47e18a403.json

007618ad76a3548195ab5d11c1e2459931c91cd1.json

1000+ more

COVID.DATA.LIC.AGMT.pdf

noncomm_use_subset

noncomm_use_subset

0036b28fddf7e93da0970303672934ea2f9944e7.json

005c43980edf3fcc2a4d12ee7ad630ddb651ce6e.json

006be99e337c84b8758591a54f0362353b24dfde.json

00a00d0edc750db4a0c299dd1ec0c6871f5a4f24.json

00e5a723d44eb9f2698c38b518eff85c00f9753b.json

01297dffaf94c1314ca46088f7b829b8383c2f73.json

013d9fb8719d3d3d47738f9f0604f3b643c4df57.json

014e31dce7e3f2b1a7020a5debfbf228182f8b5e.json

0167dddb0e2783a60841b8e6f2b4e4cb981904e2.json

018b5b5f732e955d349e14a83481739502ae104c.json

1000+ more

pmc_custom_license

pmc_custom_license

002f09dfc9a1323a15bf72e349d8b733ac97a2aa.json

0036e8891c93ae63611bde179ada1e03e8577dea.json

00573277e6be50669016f770bc28ec2da0639a8f.json

00683d59d56123ae85e080d00ef1b3edd3f7405d.json

0104f6ceccf92ae8567a0102f89cbb976969a774.json

01363927a2d74245f78e5850a085caf62836f9b8.json

01732214b0e66594afaceb2f641102b42e1b4685.json

017ca5bdac589a37196df7b8e077c4c371ab32da.json

019ede0c6f1c02b64dea8e05e3bc8c7cb5811fae.json

01cfb2699f116b6a9e107c5eb20b1c5327d554f0.json

1000+ more

biorxiv_medrxiv.tar

comm_use_subset.tar

2 more

About this file

CORD-19 dataset (2020-03-13)

2020-03-13

all_sources_metadata_2020-03-13.csv

Size 46.93 MB

json_schema.txt

Size 2.84 KB

all_sources_metadata_2020-03-13.readme

Size 1000 B

biorxiv_medrxiv

1 directory

comm_use_subset

1 directory

COVID.DATA.LIC.AGMT.pdf

Size 26.06 KB

noncomm_use_subset

1 directory

pmc_custom_license

1 directory

biorxiv_medrxiv.tar

comm_use_subset.tar

noncomm_use_subset.tar

pmc_custom_license.tar

278,939 views

8,819 downloads

73 kernels

82 topics

View more activity