Century Communities Provides COVID-19 Resources For Real Estate Agents. National homebuilder now offering remote client registration, virtual tours and more.
Justin Ng – A Human BeingKaggle competition dataset coronavirus
COVID-19 Open Research Dataset Challenge (CORD-19)
COVID-19 Open Research Dataset Challenge (CORD-19)
An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House
Allen Institute For AI
and 8 collaborators
updated 4 days ago (Version 3)
Data
Tasks(10)
Kernels(73)
Discussion(82)
Activity
Metadata
Usability9.4
License
Other (specified in description)
Tags
business
,
natural and physical sciences
,
computer science
,
health
,
biology
and 3 more
Description
Dataset Description
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
Call to Action
We are issuing a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.
A list of our initial key questions can be found under the Tasks section of this dataset. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organization’s R&D Blueprint for COVID-19.
Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.
Prizes
Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria. The winner may elect to receive this award as a charitable donation to COVID-19 relief/research efforts or as a monetary payment. More details on the prizes and timeline can be found on the discussion post.
Accessing the Dataset
We have made this dataset available on Kaggle, and are periodically updating it from its source. To learn more and access the latest copy of the dataset, you can also go here: CORD-19 | Semantic Scholar.
The licenses for each dataset can be found in the all _ sources _ metadata csv file.
Acknowledgements
This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine – National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.
Data (2 GB)
Data Sources
2020-03-13
all_sources_metadata_2020-03-13.csv
14 columns
json_schema.txt
all_sources_metadata_2020-03-13.readme
biorxiv_medrxiv
biorxiv_medrxiv
0015023cc06b5362d332b3baf348d11567ca2fbb.json
004f0f8bb66cf446678dc13cf2701feec4f36d76.json
00d16927588fb04d4be0e6b269fc02f0d3c2aa7b.json
013d9d1cba8a54d5d3718c229b812d7cf91b6c89.json
01d162d7fae6aaba8e6e60e563ef4c2fca7b0e18.json
01e3b313e78a352593be2ff64927192af66619b5.json
02201e4601ab0eb70b6c26480cf2bfeae2625193.json
0255ea4b2f26a51a3bfa3bd8f3e1978c82c976d5.json
029c1c588047f1d612a219ee15494d2d19ff7439.json
03ce432f27c7df6af22b92245a614db2ecb5de5f.json
793 more
comm_use_subset
comm_use_subset
000b7d1517ceebb34e1e3e817695b6de03e2fa78.json
00142f93c18b07350be89e96372d240372437ed9.json
0022796bb2112abd2e6423ba2d57751db06049fb.json
00326efcca0852dc6e39dc6b7786267e1bc4f194.json
00352a58c8766861effed18a4b079d1683fec2ec.json
0043d044273b8eb1585d3a66061e9b4e03edc062.json
0049ba8861864506e1e8559e7815f4de8b03dbed.json
00623bf2715e25d3acacb3f210d6888ed840e3cb.json
0072159e1ebecc889e9bcabb58bb45c47e18a403.json
007618ad76a3548195ab5d11c1e2459931c91cd1.json
1000+ more
COVID.DATA.LIC.AGMT.pdf
noncomm_use_subset
noncomm_use_subset
0036b28fddf7e93da0970303672934ea2f9944e7.json
005c43980edf3fcc2a4d12ee7ad630ddb651ce6e.json
006be99e337c84b8758591a54f0362353b24dfde.json
00a00d0edc750db4a0c299dd1ec0c6871f5a4f24.json
00e5a723d44eb9f2698c38b518eff85c00f9753b.json
01297dffaf94c1314ca46088f7b829b8383c2f73.json
013d9fb8719d3d3d47738f9f0604f3b643c4df57.json
014e31dce7e3f2b1a7020a5debfbf228182f8b5e.json
0167dddb0e2783a60841b8e6f2b4e4cb981904e2.json
018b5b5f732e955d349e14a83481739502ae104c.json
1000+ more
pmc_custom_license
pmc_custom_license
002f09dfc9a1323a15bf72e349d8b733ac97a2aa.json
0036e8891c93ae63611bde179ada1e03e8577dea.json
00573277e6be50669016f770bc28ec2da0639a8f.json
00683d59d56123ae85e080d00ef1b3edd3f7405d.json
0104f6ceccf92ae8567a0102f89cbb976969a774.json
01363927a2d74245f78e5850a085caf62836f9b8.json
01732214b0e66594afaceb2f641102b42e1b4685.json
017ca5bdac589a37196df7b8e077c4c371ab32da.json
019ede0c6f1c02b64dea8e05e3bc8c7cb5811fae.json
01cfb2699f116b6a9e107c5eb20b1c5327d554f0.json
1000+ more
biorxiv_medrxiv.tar
comm_use_subset.tar
2 more
About this file
CORD-19 dataset (2020-03-13)
2020-03-13
all_sources_metadata_2020-03-13.csv
Size 46.93 MB
json_schema.txt
Size 2.84 KB
all_sources_metadata_2020-03-13.readme
Size 1000 B
biorxiv_medrxiv
1 directory
comm_use_subset
1 directory
COVID.DATA.LIC.AGMT.pdf
Size 26.06 KB
noncomm_use_subset
1 directory
pmc_custom_license
1 directory
biorxiv_medrxiv.tar
comm_use_subset.tar
noncomm_use_subset.tar
pmc_custom_license.tar
278,939 views
8,819 downloads
73 kernels
82 topics
View more activity