Saturday, April 13, 2019

Perspectives|Business At The AI Ethical Crossroads


The Biggest AI Ethical Issues Businesses Need To Address Now—And How


 

Recent headlines have drawn attention to problems caused by disproportionate or lack of representation in the data sets selected and used to train machine learning models. Such bias can lead to unfair outcomes—such as a hiring algorithm that favors male over female applicants. It can come about when underlying data are either not a good representation of the world or reflect existing unfair patterns. 

The impact of bias on corporate decision-making tools can lead companies to miss out on hiring the best talent and bringing satisfied customers back for more. It can result not only in lost revenue from misdirected ads, scorched-earth headlines and loss of customer trust—but also in entire submarkets missed. 

For example, voice recognition software trained on English or majority dialects may be suboptimal for minority-dialect speakers, including when it comes to ascertaining whether a customer’s tone of voice is pleased or displeased. Data curated online inherently give preference to active internet users over those still coming into the digital world. User patterns, such as what constitutes a meaningful interaction with a video recommendation tool or spikes in certain search questions during elections, are more likely to reflect the behavior and priorities of internet users over nonusers and the literate over the nonliterate. Given that two-thirds of countries have more men than women online and that a significant literacy gap still exists, this potentially leaves out well over a billion prospective users.
Addressing bias early avoids having it built into key strategic decisions, business plans and road maps, with an impact that could reverberate for years. While bias is a challenging problem to solve, much good work is underway. Companies can draw upon that work as they integrate machine learning tools into executive decision-making. 

The following highlights five promising practices:

Use An Analyzer Or Third Parties To Audit For Bias
Bias analyzers are increasingly coming onto the market. IBM Watson has built an an analyzer and mitigation toolkit that makes a machine learning model’s decision-making process transparent and detects potentially unfair outputs. For example, if a model recommends denying a loan to a woman-owned business, the analyzer might show the decision was based in part on sensitivity for gender, that a disproportionate number of loans were denied to women or that the training set lacked data on women-owned businesses. The analyzer might query whether the factors used to generate the outcome are the appropriate criteria for denying the loan, or make suggestions for additional data to be incorporated. 

Other analyzers include Facebook’s Fairness Flow, an internal tool to determine whether a machine learning algorithm is systematically providing poorer results to certain protected classes. Accenture has released a tool to help its customers assess artificial intelligence models for discrimination and suggest adjustments, while also assessing trade-offs in accuracy. 

In some cases, the recommended course of correction might be to get an entirely new data set.

However, tools alone won’t get the job done. As noted by Anna Bethke, head of Intel Corp.’s AI for Social Good initiative, “A common misconception is the belief that bias can be solved by algorithms, when bias is also a cultural issue that requires cultural responses like dialogue and debate.” 

Inhi Cho Suh, general manager of IBM Watson Customer Engagement, describes a use case that aptly illustrates the need for diversity: A model built to handle inbound risks around a supplier might be trained using the company’s established playbooks and focused on optimizing cost and location. Other corporate concerns, such as sourcing from a country with unsavory labor practices, could inadvertently be overlooked.

For this reason, it’s critical that companies involve diverse human teams in their process: internal or external auditors whose incentives differ from those of the users of the algorithms and who can pose counterfactuals that challenge underlying assumptions and results. Ensuring human judgment remains integral can help developers both responsibly manage their AI tools and harness their benefits.

Train Models With High-Quality Data Sets
Many concerns regarding unfairly biased algorithmic outcomes can be traced back to the data set and the lack of inclusion and diversity in it. Researchers in the AI community are calling for governments to provide access to high-quality data sets.

In addition, the private sector is coming up with its own solutions. Mighty AI, an Intel Capital portfolio company, is one of a handful of startups labeling and cleaning customer data sets for training and validation of computer vision models. Earlier this year, IBM released a data set of over 1 million facial images to counterbalance the lack of diversity in those that are currently available.

In applying a machine learning algorithm to a particular data set, businesses need to ask themselves: What demographic is intended to be captured by the data set? What characteristics may be over- or underrepresented, and is this model appropriate for this data set? If data are lacking in representation, what additional data are needed to balance the data set? If such data do not exist, consider alternatives, such as incorporating synthetically created data or addressing the original problem in another manner.

Another approach is to select data sets that limit biased inputs. A Palo Alto-based company, still in stealth mode, is building deep learning models to identify malignant tumors. It chose to train its models with images of actual human tissue samples, rather than with doctor notes culled from electronic health records, which may be inherently biased given that doctors write them knowing their patients will read them. 

Finally, selecting and scoping the right data set from a project’s outset also helps mitigate the risk of biased outcomes. Instead of opportunistically processing available data through deep learning models, executives should identify the specific business problem to be solved. Only then can their teams look for and scope the appropriate data set and, if one is not available, consider alternatives.

Use Model Cards To Help Standardize Developer And User Decision-Making
Google researchers have proposed a model card to accompany every machine learning model. Their model card, like a nutrition information label, discloses a standardized set of information to enable machine learning developers and users to make informed decisions about the appropriateness of a particular model for a user case, as well as to evaluate and implement its outcomes. 

Such information includes how a model was built, the assumptions made, the primary intended use case and end users—such as labeling data for entertainment versus enterprise solutions—and how the model might perform across different cultural, demographic or phenotypic groups. Information should not only include single categories such as “men,” “women” and “nonbinary” gender groups, but also consider intersectionality, simultaneously looking at two or more cultural, demographic or phenotypic characteristics.
Additional information could include model date, version, type, training data and a matrix of error classifications: false positive, false negative, false discovery rate and false omission rate, and their relative importance for particular data sets. A model trained to identify smiling older men, for example, would be more likely to produce false omissions for a data set of mixed ages and genders, and it would likely not be the appropriate model for the latter.
The researchers also propose including a “toxicity” score that rates the model’s performance across sensitive groups, as well as ethical considerations, challenges and recommendations. Preservation of data privacy should also be considered. 

Transparency can help companies avoid making critical decisions based on the biased results of a machine learning model. However, transparency itself is not the end goal. As Dr. Amir Khosrowshahi, Vice President and AI chief technology officer for Intel Corp., has noted, “Humans are not transparent, and transparency can sometimes be harmful.” Transparency potentially allows a human operator to game a model’s recommendations, undercutting the very safeguards it was intended to provide. Other guardrails should accompany model cards and increased transparency, such as diverse auditing teams that can evaluate what aspect of bias is being solved for and the risk of introducing new biases.

Add Randomness To Recommendations
Companies whose business models revolve around making good recommendations—books, movies, services—are constantly considering the balance between serving the user more of the same and providing expansive, diverse or novel choices. These recommendations are typically made based on collective user behavior and personal characteristics. If “Avengers: Infinity War” is the movie of choice for an enthusiastic online demographic, then this film may be more likely to be served up widely to all, regardless of actual interest. Patients from a poorer state might be shown more cost-effective treatments, rather than expensive alternatives offered to patients from wealthier states.

Recommendation engines are one of the most heavily researched areas in machine learning, with the Netflix Prize open competition taking place nearly a decade ago. Neural network models developed by 20th Century Fox combine historical customer data with temporal sequencing in a film—such as long versus short shots—that conveys information about movie type, plot and characters. 

Other encouraging developments in this space are underway. One rule of thumb, for ease of application, is for companies to incorporate a specific level of serendipity, say 10%, into their recommendations. Then, even if a user’s prior clicks or demographics lead the person into a bubble of “Star Wars” and action film recommendations—or more problematically, conspiracy theory videos—the engine can also throw in some historical fiction and documentaries, or in the case of patients, state-of-the-art treatments. To the extent the user selects from the random offerings, his or her choice would further improve the model’s overall effectiveness.

Weigh Human Bias Against Machine Bias
A machine learning algorithm that makes end-of-life predictions might not fit a patient from a demographic that differs from the patients in the underlying training data, which could be as broad as all patients in the electronic health records. Certain patients from minority populations could be underidentified for end-of-life care and palliative services. At first glance, hospitals may hesitate to deploy such tools until this potential for biased outcomes has been fully corrected—not an easy task. 

However, as noted by Dr. Stephanie Harman, founding medical director of Palliative Care Services and co-chair of the Health Care Ethics Committee at Stanford School of Medicine, doctors are constantly making similar judgments based on their professional experience—essentially, a doctor’s personal data set of patients and case studies encountered in practice. Studies have shown that physicians tend to under-refer patients for palliative care, for reasons such as overoptimism, time constraints and inertia.

Before throwing the baby out with the bathwater, perhaps the more relevant consideration for hospitals is not whether the machine’s output is biased, but whether the machine augmenting the human decision reduces overall bias. Such a tool, though still limited, can nonetheless provide important safeguards against human biases.

Looking Forward
As companies increasingly rely on machine learning solutions to inform key corporate decisions, AI ethical frameworks and risk management guidelines are being developed at board levels. This is a long-term conversation with broader implications. 

Ultimately, bias in machine learning algorithms comes from humans. Technology is uncovering latent biases deeply rooted in our history as a human race, bringing them into the light in a way never brought before. Cognitive biases, such as information biases, blind spots, confirmation bias and tricky inherent biases, and their impact on machine learning models, have only begun to be addressed.

As we seek to mitigate these to better serve the next billion customers, we also have a unique opportunity to reflect on our own humanity and challenge our human biases and assumptions—to create not only better AI, but a better world as well.


About the Author
The views expressed in this article are those of the author and do not necessarily reflect the views or policy positions of her employer.
Abigail Hing Wen serves as counsel to the Office of the AI CTO, Intel Corp., focused on emerging AI technologies and the ecosystem. She also partners closely with investors for Intel Capital’s AI investments and has worked with over 100 Silicon Valley startups, from incorporation to acquisition or IPO. Her debut novel is forthcoming in 2020. Twitter: @abigailhingwen.
Credit: Thomas Barwick/GettyImages

You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.
Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity and artificial intelligence, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers, and society.

No comments:

Post a Comment