Research Data Access Resources: Math, Engineering, and Computer Science
This research guide identifies electronic datasets to support statistical research in the social sciences.
Overview
This section of the research guide identifies electronic datasets to support research in
For a list of subject specialists covering these topics, please see the library's list of subject specialists.
Math
- National Science Digital LibraryThe National Science Digital Library provides high quality online educational resources for teaching and learning, with current emphasis on the sciences, technology, engineering, and mathematics (STEM) disciplines—both formal and informal, institutional and individual, in local, state, national, and international educational settings. The NSDL collection contains structured descriptive information (metadata) about web-based educational resources held on other sites by their providers. These providers have contribute this metadata to NSDL for organized search and open access to educational resources via this website and its services.
- cIRclecIRcle is an open access digital repository for published and unpublished material created by the UBC community and its partners. In BIRS there are thousands of mathematics videos, which are primary research data. Our repository is the largest source of mathematics data with more than 10TB of primary research by the best mathematicians in the world, coming from more than 600 institutions.
Engineering
- U.S. Energy Information AdministrationThe U.S. Energy Information Administration (EIA) collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.
- ASTM InternationalASTM International, formerly known as the American Society for Testing and Materials (ASTM), is a globally recognized leader in the development and delivery of international voluntary consensus standards.
- Prognostics Center of Excellence Data Set RepositoryNASA's Prognostics Center of Excellence hosts the Prognostics Data Repository to provide data used in the development of prognostic algorithms, and time series of nominal to failed states.
- Alternative Fuels Data CenterThe Alternative Fuels Data Center (AFDC) is a comprehensive clearinghouse of information about advanced transportation technologies. The AFDC offers transportation decision makers unbiased information, data, and tools related to the deployment of alternative fuels and advanced vehicles.
- Buildings Data PlatformThe Buildings Data Platform mission is to collect and curate high-resolution, well-calibrated time series of building operational and indoor/outdoor environmental data, which are crucial to understanding and optimizing building energy efficiency performance and demand flexibility capabilities as well as benchmarking energy algorithms.
- ROSA PROSA P is the United States Department of Transportation (US DOT) National Transportation Library's (NTL) Repository and Open Science Access Portal (ROSA P).
- 4TU.ResearchData | science.engineering.design4TU.ResearchData, previously known as 4TU.Centre for Research Data, is a research data repository dedicated to the science, engineering and design disciplines.
- CatenaCatena, the Digital Archive of Historic Gardens and Landscapes, is a collection of historic and contemporary images, including plans, engravings, and photographs, intended to support research and teaching in the fields of garden history and landscape studies.
Computer Science
- CRAWDADCRAWDAD is the Community Resource for Archiving Wireless Data, a wireless network data resource for the research community. This archive has the capacity to store wireless trace data from many contributing locations, and staff to develop better tools for collecting, anonymizing, and analyzing the data.
- BitbucketBitbucket is a web-based version control repository hosting service owned by Atlassian, for source code and development projects that use either Mercurial or Git revision control systems.
- Stanford Network Analysis ProjectStanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is also available through the NodeXL which is a graphical front-end that integrates network analysis into Microsoft Office and Excel.
- IMPACTThe Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT) program supports global cyber risk research & development by coordinating, enhancing and developing real world data, analytics and information sharing capabilities, tools, models, and methodologies.
- Cite SeerXCiteSeerx is an evolving scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries.
- Software Heritage ArchiveThe long term goal of the Software Heritage initiative is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it.
- Social Computing Data RepositorySocial Computing Data Repository hosts data from a collection of many different social media sites, most of which have blogging capacity. Some of the prominent social media sites included in this repository are BlogCatalog, Twitter, MyBlogLog, Digg, StumbleUpon, del.icio.us, MySpace, LiveJournal, The Unofficial Apple Weblog (TUAW), Reddit, etc. The repository contains various facets of blog data including blog site metadata like, user defined tags, predefined categories, blog site description; blog post level metadata like, user defined tags, date and time of posting; blog posts; blog post mood (which is defined as the blogger's emotions when (s)he wrote the blog post); blogger name; blog post comments; and blogger social network.
Data Science
- Pacific Northwest National Laboratory DataHub: Scientific Data RepositorySharing and preserving data are central to protecting the integrity of science. DataHub, a Research Computing endeavor, provides tools and services to meet scientific data challenges at Pacific Northwest National Laboratory (PNNL). DataHub helps researchers address the full data life cycle for their institutional projects and provides a path to creating findable, accessible, interoperable, and reusable (FAIR) data products.
- UCI Machine Learning RepositoryThe UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. It is used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times.
- KaggleKaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.
Key Resources
- U. S. Census Bureau"Serve(s) as the leading source of quality data about the nation's people and economy." Includes the Decennial Census, the American Community Survey the Economic Census, and more.
- Data.govThe Home of the U.S. Government's Open Data:
Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.