Powered by OpenAIRE graph

Spotify UK

2 Projects, page 1 of 1
  • Funder: UK Research and Innovation Project Code: EP/W024330/1
    Funder Contribution: 1,343,620 GBP

    A central aspect of science and engineering is to be able to answer "what if" questions. What will happen if this gene suffers a mutation? What are the public health consequences of having this social benefit cut? What can we do to mitigate disparities among social groups? To which extent are lockdowns useful to mitigate a pandemic? Which ramifications will take place if failures occur at these points of a major logistical operation such as food supply chains? These are cause-effect questions. Answering them is hard because it involves change. Historical data may fail to capture the implications of change, placing causal questions out of the comfort zone by which data is used to inform decisions. It is one thing to predict the life expectancy of a smoker, as done by public health officials or insurance companies. It is much harder to understand what will happen if we convince someone to stop smoking, as historical data may have a substantive number of cases where people stopped smoking shortly before dying of respiratory disease, due to discomfort. A statistical or machine learning method oblivious to these causal explanations may actually say that stopping smoking is bad for one's health. Ideally, we would like to perform randomised controlled trials where the choice of action to be taken is decided by the flip of coin, so that confounding factors between cause and effect are overridden. This removal of confounding is necessary to show convincingly, for instance, that a covid-19 vaccination works due to biological processes as opposed to sociological confounding factors among those who choose to be vaccinated and their health outcomes. However, in many cases such trials can be very expensive (understanding genetic networks involves a large experimental space) or unethical (we cannot force someone to smoke or not), and even when they take place, a controlled trial may not fully control the factor of interest (we can randomly assign a drug or placebo to a patient, but we may not have the means to make the patient comply with the treatment if they stay at home). Data scientists have not ignored these problems, and we can thank the hard work of epidemiologists, for instance, for presenting a convincing case establishing the harmful link between smoking and lung cancer. But without randomised trials, the answer to a "what if" question requires assumptions or otherwise it is unknowable. This means that causal inference progresses slowly and is prone to mistakes. Part of the reason is that, traditionally, methods for causal inference largely rely on pre-defined families of assumptions chosen by statisticians designing methods that will provide unambiguous answers. Applied scientists then choose to adopt a particular method according to what manages to be a good enough approximation to their understanding of the world (one simple case: assume we have no common causes that are not measured in the data!). Although there are tools for sensitivity analysis (what if assumptions are violated in some particular ways?), they don't address the main issue directly: a domain-expert should be given the chance of specifying upfront assumptions according to the way they see appropriate, and not be artificially told a single, convenient answer, but what indeed can be disentangled from the observational data given the information provided. One of the reasons this workflow is not popular is the need for computationally-intensive algorithms to deduce the consequences of such assumptions. This project has the ambition of changing the common practice for causal inference, increasing transparency and the speed by which we understand the limits of our knowledge and where to look for in order to progress. It will rely on cutting-edge algorithms for providing a flexible sandbox for domain experts to express their knowledge on a very flexible way, while offering also the backend support for the sophisticated computational methods needed.

    more_vert
  • Funder: UK Research and Innovation Project Code: EP/Y034813/1
    Funder Contribution: 7,873,680 GBP

    The EPSRC Centre for Doctoral Training in Statistics and Machine Learning (StatML) will address the EPSRC research priority of the 'physical and mathematical sciences powerhouse' through an innovative cohort-based training program. StatML harnesses the combined strengths of Imperial and Oxford, two world-leading institutions in statistics and machine learning, in collaboration with a broad spectrum of industry partners, to nurture the next generation of leaders in this field. Our students will be at the forefront of advancing the core methodologies of data science and AI, crucial for unlocking the value inherent in data to benefit industry and society. They will be equipped with advanced research, technical, and practical skills, enabling them to make tangible real-world impacts. Our students will be ethical and responsible innovators, championing reproducible research and open science. Collaborating with students, charities and equality experts, StatML will also pioneer a comprehensive strategy to promote inclusivity, attract individuals from diverse backgrounds and eliminate biases. This will help diversify the UK's future statistics and machine learning workforce, essential for ensuring data science is used for public good. Data science and AI are now part of our everyday lives, transforming all sectors of the economy. To future-proof the UK's prosperity and security, it is essential to develop new methodology, specifically tailored to meet the big societal challenges of the future. The techniques underpinning such methods are founded in statistics and machine learning. Through close collaboration with a broad range of industry partners, our cohort-based training will support the UK in producing a critical mass of world-leading researchers with expertise in developing cutting-edge, impactful statistical and machine learning methodology and theory. It is well documented in government and learned society reports that the UK economy has an urgent need for these people. The significant level of industry support for our proposal also highlights the necessity of filling this gap in the UK data science ecosystem. StatML will learn from and build upon our previous successful experiences in cohort training of doctoral students (our existing StatML CDT funded in 2018, as well as other CDTs at Imperial and Oxford). Our students will continue to produce impactful, internationally leading research in statistics and machine learning (as evidenced by our students' impressive publication record and our world-leading research environment, as rated by the REF 2021 evaluation), while complementing this with a bespoke cohort-based Advanced Training program in Statistics and Machine Learning (StatML-AT). StatML-AT has been developed from our experience and in partnership with industry. It will be responsive to emerging technologies and equip our students with the practical skills required to transform how data is used. It will be delivered by our outstanding academics from both institutions alongside with industry leaders to ensure that students receive training in cutting edge technologies, along with the latest ideas in ethics, responsible innovation, sustainability and entrepreneurship. This will be complemented by industrial and academic placements to allow the students to develop their own international network and produce high-impact research. Together, StatML and its partners will train 90+ students over 5 cohorts. More than half of these will be funded from external sources, including 25+ by industry, representing excellent value for money. Our diverse cohorts will benefit from a unique and responsive training program combining academic excellence, industry engagement, and interdisciplinary culture. This will make StatML a vibrant research environment inspiring the next methodological advancements to transform the use of data and AI across industry and society.

    more_vert

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.