Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mailmap update #95634

Merged
merged 1 commit into from
Apr 8, 2022
Merged

Mailmap update #95634

merged 1 commit into from
Apr 8, 2022

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Apr 4, 2022

I noticed there are a lot of contributors who appear multiple times in https://thanks.rust-lang.org/rust/all-time/, which makes their "rank" on that page inaccurate. For example Nick Cameron currently appears at rank 21 with 2010 contributions and at rank 27 with 1287 contributions, because some of those are from nrc⁠@ncameron.org and some from ncameron⁠@mozilla.com. In reality Nick's rank would be 11 if counted correctly, which is a large difference.

Solving this in a totally automated way is tricky because it involves figuring out whether Nick is 1 person with multiple emails, or is 2 people sharing the same name.

This PR addresses a subset of the cases: only where a person has committed under multiple names using the same email. This is still not something that can be totally automated (e.g. by modifying https://github.com/rust-lang/thanks to dedup by email instead of name+email) because:

  • Some emails are not necessarily unique to one contributor, such as ubuntu@localhost.

  • It involves some judgement and mindfulness in picking the "canonical name" among the names used with a particular email. This is the name that will appear on thanks.rust-lang.org. Humans change their names sometimes and can be sensitive or picky about the use of names that are no longer preferred.

For the purpose of this PR, I've tried to stick to the following heuristics which should be unobjectionable:

  • If one of the names is currently set as the display name on the contributor's GitHub profile, prefer that name.

  • If one of the names is used exclusively over the others in chronologically newer pull requests, prefer the newest name.

  • If one of the names has whitespace and the other doesn't (i.e. is username-like), such as Foo Bar vs FooBar or foobar or foo-bar123, but otherwise closely resemble one another, then prefer the human-like name.

  • If none of the above suffice in determining a canonical name and the contributor has some other name set on their GitHub profile, use the name from the GitHub profile.

  • If no name on their GitHub profile but the profile links to their personal website which unambiguously identifies their preferred name, then use that name.

I'm also thinking about how to handle cases like Nick's, but that will be a project for a different PR. Basically I'd like to be able to find cases of the same person making commits that differ in name and email by looking at all the commits present in pull requests opened by the same GitHub user.

script
[dependencies]
anyhow = "1.0"
git2 = "0.14"
mailmap = "0.1"
use anyhow::{bail, Context, Result};
use git2::{Commit, Oid, Repository};
use mailmap::{Author, Mailmap};
use std::collections::{BTreeMap as Map, BTreeSet as Set};
use std::fmt::{self, Debug};
use std::fs;
use std::path::Path;

const REPO: &str = "/git/rust";

fn main() -> Result<()> {
    let repo = Repository::open(REPO)?;
    let head_oid = repo
        .head()?
        .target()
        .context("expected head to be a direct reference")?;
    let head = repo.find_commit(head_oid)?;

    let mailmap_path = Path::new(REPO).join(".mailmap");
    let mailmap_contents = fs::read_to_string(mailmap_path)?;
    let mailmap = match Mailmap::from_string(mailmap_contents) {
        Ok(mailmap) => mailmap,
        Err(box_error) => bail!("{}", box_error),
    };

    let mut history = Set::new();
    let mut merges = Vec::new();
    let mut authors = Set::new();
    let mut emails = Map::new();
    let mut all_authors = Set::new();
    traverse_left(head, &mut history, &mut merges, &mut authors, &mailmap)?;
    while let Some((commit, i)) = merges.pop() {
        let right = commit.parents().nth(i).unwrap();
        authors.clear();
        traverse_left(right, &mut history, &mut merges, &mut authors, &mailmap)?;
        for author in &authors {
            all_authors.insert(author.clone());
            if !author.email.is_empty() {
                emails
                    .entry(author.email.clone())
                    .or_insert_with(Map::new)
                    .entry(author.name.clone())
                    .or_insert_with(Set::new);
            }
        }
        if let Some(summary) = commit.summary() {
            if let Some(pr) = parse_summary(summary)? {
                for author in &authors {
                    if !author.email.is_empty() {
                        emails
                            .get_mut(&author.email)
                            .unwrap()
                            .get_mut(&author.name)
                            .unwrap()
                            .insert(pr);
                    }
                }
            }
        }
    }

    for (email, names) in emails {
        if names.len() > 1 {
            println!("<{}>", email);
            for (name, prs) in names {
                let prs = DebugSet(prs.iter().rev());
                println!("    {} {:?}", name, prs);
            }
        }
    }

    eprintln!("{} commits", history.len());
    eprintln!("{} authors", all_authors.len());
    Ok(())
}

fn traverse_left<'repo>(
    mut commit: Commit<'repo>,
    history: &mut Set<Oid>,
    merges: &mut Vec<(Commit<'repo>, usize)>,
    authors: &mut Set<Author>,
    mailmap: &Mailmap,
) -> Result<()> {
    loop {
        let oid = commit.id();
        if !history.insert(oid) {
            return Ok(());
        }
        let author = author(mailmap, &commit);
        let is_bors = author.name == "bors" && author.email == "bors@rust-lang.org";
        if !is_bors {
            authors.insert(author);
        }
        let mut parents = commit.parents();
        let parent = match parents.next() {
            Some(parent) => parent,
            None => return Ok(()),
        };
        for i in 1..1 + parents.len() {
            merges.push((commit.clone(), i));
        }
        commit = parent;
    }
}

fn parse_summary(summary: &str) -> Result<Option<PullRequest>> {
    let mut rest = None;
    for prefix in [
        "Auto merge of #",
        "Merge pull request #",
        " Manual merge of #",
        "auto merge of #",
        "auto merge of pull req #",
        "rollup merge of #",
        "Rollup merge of #",
        "Rollup merge of  #",
        "Rollup merge of ",
        "Merge PR #",
        "Merge #",
        "Merged #",
    ] {
        if summary.starts_with(prefix) {
            rest = Some(&summary[prefix.len()..]);
            break;
        }
    }
    let rest = match rest {
        Some(rest) => rest,
        None => return Ok(None),
    };
    let end = rest.find([' ', ':']).unwrap_or(rest.len());
    let number = match rest[..end].parse::<u32>() {
        Ok(number) => number,
        Err(err) => {
            eprintln!("{}", summary);
            bail!(err);
        }
    };
    Ok(Some(PullRequest(number)))
}

fn author(mailmap: &Mailmap, commit: &Commit) -> Author {
    let signature = commit.author();
    let name = String::from_utf8_lossy(signature.name_bytes()).into_owned();
    let email = String::from_utf8_lossy(signature.email_bytes()).into_owned();
    mailmap.canonicalize(&Author { name, email })
}

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
struct PullRequest(u32);

impl Debug for PullRequest {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        write!(formatter, "#{}", self.0)
    }
}

struct DebugSet<T>(T);

impl<T> Debug for DebugSet<T>
where
    T: Iterator + Clone,
    T::Item: Debug,
{
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.debug_set().entries(self.0.clone()).finish()
    }
}

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 4, 2022
@@ -7,51 +7,98 @@

Aaron Todd <github@opprobrio.us>
Abhishek Chanda <abhishek.becs@gmail.com> Abhishek Chanda <abhishek@cloudscaling.com>
Abhijeet Bhagat <abhijeet.bhagat@gmx.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abhi {#35028, #34990, #34974, #34742}
abhijeetbhagat {#38797, #37636}

@@ -7,51 +7,98 @@

Aaron Todd <github@opprobrio.us>
Abhishek Chanda <abhishek.becs@gmail.com> Abhishek Chanda <abhishek@cloudscaling.com>
Abhijeet Bhagat <abhijeet.bhagat@gmx.com>
Abroskin Alexander <arkweid@evilmartians.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A.A.Abroskin {#3582}
Abroskin Alexander {#3582}

Adolfo Ochagavía <aochagavia92@gmail.com>
Adrian Heine né Lang <mail@adrianheine.de>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adrian Heine {#31182}
Adrian Heine né Lang {#65639, #64606, #56162}

Adrien Tétar <adri-from-59@hotmail.fr>
Ahmed Charles <ahmedcharles@gmail.com> <acharles@outlook.com>
Alan Egerton <eggyal@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan Egerton {#91426, #91353, #91324, #91318, #91230, #85287, #85284, #83895, #83863, #83372, #75048, #73138, #1169, #1166}
eggyal {#91318}

Adrien Tétar <adri-from-59@hotmail.fr>
Ahmed Charles <ahmedcharles@gmail.com> <acharles@outlook.com>
Alan Egerton <eggyal@gmail.com>
Alan Stoate <alan.stoate@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan Stoate {#40837}
aStoate {#40837}

Ilyong Cho <ilyoan@gmail.com>
inquisitivecrystal <22333129+inquisitivecrystal@users.noreply.github.com>
Irina Popa <irinagpopa@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irina Popa {#55627, #52461, #50615, #50228}
Irina-Gabriela Popa {#46305}
Unknown {#54233}

Ivan Ivaschenko <defuz.net@gmail.com>
ivan tkachenko <me@ratijas.tk>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ivan tkachenko {#73381}
ratijas {#72532}

J. J. Weber <jjweber@gmail.com>
Jack Huey <jack.huey@umassmed.edu>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jack {#72936, #63394, #63379}
Jack Huey {#85499, #85313, #84682, #84623, #84622, #84559, #84377, #83944, #83870, #83767, #83090, #82743, #81671, #81485, #80679, #80593, #80163, #80106, #79945, #78545, #77685, #77515, #76814, #75173, #73681, #72936, #72150, #71758, #69406, #7051, #6482, #5608, #5587}
jackh726 {#91243, #90017, #89970, #89915, #89914, #89823, #89345, #89344, #89285, #89001, #88846, #88811, #88771, #88441, #88336, #88312, #88061, #87903, #87900, #87478, #87281, #87246, #87244, #87203, #86993, #86697, #85499}

J. J. Weber <jjweber@gmail.com>
Jack Huey <jack.huey@umassmed.edu>
Jacob <jacob.macritchie@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jacob {#52946}
Strategic Technologies {#52946}

J. J. Weber <jjweber@gmail.com>
Jack Huey <jack.huey@umassmed.edu>
Jacob <jacob.macritchie@gmail.com>
Jacob Greenfield <xales@naveria.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jacob Greenfield {#59766}
xales {#12742, #11945, #11895, #11845, #11842, #11837, #11795}

Jacob Pratt <jacob@jhpratt.dev> <the.z.cuber@gmail.com>
Jake Vossen <jake@vossen.dev>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= {#64898, #64884}
Jake Vossen {#80714, #77190, #69442}

Jacob Pratt <jacob@jhpratt.dev> <the.z.cuber@gmail.com>
Jake Vossen <jake@vossen.dev>
Jakob Degen <jakob@degen.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jake Degen {#73476}
Jakob Degen {#93606, #93449, #93387, #93109, #91840, #90845, #90819, #90288, #90221, #90218}

Jacob Pratt <jacob@jhpratt.dev> <the.z.cuber@gmail.com>
Jake Vossen <jake@vossen.dev>
Jakob Degen <jakob@degen.com>
Jakob Lautrup Nysom <jako3047@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jakob Lautrup Nysom {#1328}
Machtan {#1328}

Jake Vossen <jake@vossen.dev>
Jakob Degen <jakob@degen.com>
Jakob Lautrup Nysom <jako3047@gmail.com>
Jakub Adam Wieczorek <jakub.adam.wieczorek@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jakub Adam Wieczorek {#74557, #63406, #63227, #23332, #19780, #19242, #19158, #19118, #19115, #19087, #19073, #19049, #19013, #18978, #18752, #18555, #18542, #18493, #18468, #18438, #18356, #18324, #18265, #18264, #18171, #18099, #18015, #17983, #17952, #17948, #17944, #17722, #17721, #17634, #17603, #17415, #17414, #17413, #17410, #17311, #17279, #17199, #17130, #17085, #16883, #16568, #16567, #16554, #16537, #15870, #15862, #15808, #15754, #15700, #15650, #15615, #15508, #15507, #15489, #15486, #15454, #15388, #15336, #15272, #15186, #15163, #15086, #15081, #15078, #14867, #14854, #14752, #14731, #14696, #14643, #14605, #14562, #14561, #9148, #9061}
Jakub Wieczorek {#4308}
jakubadamw {#71026}

Jakub Adam Wieczorek <jakub.adam.wieczorek@gmail.com> <jakub.bukaj@yahoo.com>
Jakub Adam Wieczorek <jakub.adam.wieczorek@gmail.com> <jakub@jakub.cc>
Jakub Adam Wieczorek <jakub.adam.wieczorek@gmail.com> <jakubw@jakubw.net>
James [Undefined] <tpzker@thepuzzlemaker.info>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

James {#79087}
ThePuzzlemaker {#92381, #88706, #80284, #80226, #79606, #79082}

Renato Riccieri Santos Zannon <renato@rrsz.com.br>
Richard Diamond <wichard@vitalitystudios.com> <wichard@hahbee.co>
Ricky Hosfelt <ricky@hosfelt.io>
Ritiek Malhotra <ritiekmalhotra123@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ritiek Malhotra {#46231, #46119}
ritiek {#47609, #47185}

Rob Arnold <robarnold@cs.cmu.edu>
Rob Arnold <robarnold@cs.cmu.edu> Rob Arnold <robarnold@68-26-94-7.pools.spcsdns.net>
Robert Foss <dev@robertfoss.se> robertfoss <dev@robertfoss.se>
Robert Gawdzik <rgawdzik@hotmail.com> Robert Gawdzik ☢ <rgawdzik@hotmail.com>
Robert Habermeier <rphmeier@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robert Habermeier {#33663}
rphmeier {#31721}

Robert Millar <robert.millar@cantab.net>
Roc Yu <rocyu@protonmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roc Yu {#92947, #92936, #92361, #92283, #92269, #92188, #92095}
vacuus {#92095}

Rohit Joshi <rohitjoshi@users.noreply.github.com> Rohit Joshi <rohit.joshi@capitalone.com>
Roxane Fruytier <roxane.fruytier@hotmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roxane {#89282, #88390, #88280, #88039, #87996, #87554, #87161, #86965, #86869, #86726, #85724, #84730, #83521, #82536, #80635, #78432, #77873, #7051}
Roxane Fruytier {#86726, #78801}

Rohit Joshi <rohitjoshi@users.noreply.github.com> Rohit Joshi <rohit.joshi@capitalone.com>
Roxane Fruytier <roxane.fruytier@hotmail.com>
Rui <xiongmao86dev@sina.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rui {#4543}
xiongmao86 {#5425, #5141, #5058, #4543, #3535, #3530}

Russell Johnston <rpjohnst@gmail.com>
Rustin-Liu <rustin.liu@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustin-Liu {#78431, #78347, #72258, #71512, #71511, #71270, #70643, #70576}
hi-rustin {#91310, #90867, #86427, #86426, #86320, #86192, #85617, #85355, #85104, #85018, #84854, #84574, #84285, #84185, #83673, #83468, #83020, #71511, #7237}

Russell Johnston <rpjohnst@gmail.com>
Rustin-Liu <rustin.liu@gmail.com>
Rusty Blitzerr <rusty.blitzerr@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blitzerr {#56906, #55953}
Rusty Blitzerr {#54343, #54072}
blitzerr {#76993}

Russell Johnston <rpjohnst@gmail.com>
Rustin-Liu <rustin.liu@gmail.com>
Rusty Blitzerr <rusty.blitzerr@gmail.com>
RustyYato <krishna.sd.2012@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ozaren {#70962, #56796}
RustyYato {#81730}

Ruud van Asseldonk <dev@veniogames.com> Ruud van Asseldonk <ruuda@google.com>
Ryan Leung <rleungx@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan Leung {}
rleungx {#50416, #2563}

Ryan Scheel <ryan.havvy@gmail.com>
Ryan Sullivant <rsulli55@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan Sullivant {#6119}
rsulli55 {#6119}

Wesley Wiser <wwiser@gmail.com> <wesleywiser@microsoft.com>
whitequark <whitequark@whitequark.org>
William Ting <io@williamting.com> <william.h.ting@gmail.com>
Wim Looman <wim@nemo157.com>
Without Boats <woboats@gmail.com>
Without Boats <woboats@gmail.com> <boats@mozilla.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without Boats {#53877, #53874, #53533, #51580}
boats {#49381, #49058, #48386}

Wim Looman <wim@nemo157.com>
Without Boats <woboats@gmail.com>
Without Boats <woboats@gmail.com> <boats@mozilla.com>
Xinye Tao <xy.tao@outlook.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xinye Tao {#88717}
tabokie {#88717}

Xuefeng Wu <benewu@gmail.com> Xuefeng Wu <xfwu@thoughtworks.com>
Xuefeng Wu <benewu@gmail.com> XuefengWu <benewu@gmail.com>
York Xiang <bombless@126.com>
Youngsoo Son <ysson83@gmail.com> <ysoo.son@samsung.com>
Youngsuk Kim <joseph942010@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JOE1994 {#72834, #72829, #72214, #71284, #71254, #70397, #69326}
Youngsuk Kim {#73572, #71219, #71124, #70567, #70379, #70061, #69568, #66106}

Xuefeng Wu <benewu@gmail.com> Xuefeng Wu <xfwu@thoughtworks.com>
Xuefeng Wu <benewu@gmail.com> XuefengWu <benewu@gmail.com>
York Xiang <bombless@126.com>
Youngsoo Son <ysson83@gmail.com> <ysoo.son@samsung.com>
Youngsuk Kim <joseph942010@gmail.com>
Yuki Okushi <jtitor@2k36.org>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JohnTitor {#83711, #83699}
Yuki Okushi {#93795, #93621, #91388, #91033, #91019, #90956, #90955, #90945, #90942, #90936, #90934, #90787, #90724, #90678, #90598, #90161, #90127, #90119, #90078, #90067, #90025, #89975, #89968, #89965, #89956, #89946, #89945, #89942, #89937, #89922, #89918, #89847, #89089, #89037, #87948, #87923, #87851, #87822, #87808, #87746, #87725, #87689, #87646, #87640, #87615, #87607, #87606, #87600, #87569, #87566, #87540, #87509, #87413, #87400, #87242, #87156, #87118, #87099, #87098, #87095, #87068, #87029, #86966, #86920, #86891, #86875, #86867, #86859, #86858, #86817, #86813, #86796, #86795, #86791, #86763, #86762, #86757, #86725, #86704, #86690, #86688, #86687, #86627, #86589, #86588, #86545, #86527, #86522, #86515, #86505, #86502, #86460, #86456, #86429, #86422, #86399, #86392, #86388, #86387, #86385, #86382, #86379, #86355, #86353, #86348, #86345, #86343, #86339, #86338, #86321, #86280, #86279, #86273, #86248, #86233, #86226, #86205, #86189, #86186, #86160, #86127, #86091, #86054, #86006, #85984, #85952, #85711, #85199, #85165, #85022, #84786, #84646, #84644, #84606, #84525, #84501, #84490, #83905, #83816, #83811, #83806, #83781, #83745, #83739, #83736, #83734, #83729, #83671, #83656, #83654, #83643, #83636, #83634, #83602, #83573, #83547, #83508, #83454, #83398, #83239, #83225, #83223, #83199, #83105, #83067, #83062, #83042, #82953, #82851, #82829, #82793, #82774, #82756, #82752, #82747, #82718, #82698, #82654, #82630, #82627, #82393, #82359, #82249, #82053, #82025, #81952, #81545, #81493, #81461, #81417, #81240, #81035, #80941, #80939, #80928, #80905, #80902, #80898, #80867, #80806, #80790, #80708, #80548, #80539, #80535, #80510, #80503, #80185, #80180, #80105, #80026, #80025, #79994, #79959, #79863, #79719, #79712, #79620, #78810, #78697, #78661, #78562, #78512, #78421, #78352, #78350, #78349, #78270, #78268, #78265, #78264, #78263, #78219, #78212, #78209, #78178, #78133, #78127, #78060, #78059, #78028, #77959, #77956, #77954, #77925, #77917, #77867, #77863, #77805, #77798, #77766, #77741, #77606, #77517, #77436, #77388, #75865, #75765, #75708, #75695, #75692, #75653, #75476, #75436, #75388, #75351, #75308, #75276, #75238, #75177, #75174, #75136, #75126, #75083, #75060, #75031, #75020, #74994, #74965, #74963, #74894, #74889, #74872, #74822, #74817, #74772, #74710, #74680, #74618, #74168, #74136, #74037, #73953, #73937, #73906, #73795, #73646, #72924, #72905, #72775, #72768, #72727, #72714, #72682, #72601, #72572, #72560, #72492, #71952, #71410, #71310, #71287, #71271, #71239, #71204, #71185, #71182, #71087, #71041, #70731, #70174, #69986, #69966, #69842, #69836, #69727, #69688, #69667, #69666, #69606, #69605, #69561, #69538, #69507, #69449, #69422, #69258, #69234, #69226, #69205, #69192, #69179, #69172, #69166, #69154, #69088, #68894, #68880, #68851, #68800, #68763, #68756, #68754, #68752, #68744, #68740, #68735, #68635, #68634, #68633, #68625, #68587, #68546, #68534, #68526, #68439, #68405, #68265, #68248, #68236, #68201, #68188, #68183, #68179, #68174, #68156, #68145, #68101, #68072, #68069, #68067, #68011, #67996, #67986, #67967, #67964, #67950, #67940, #67895, #67828, #67775, #67726, #67721, #67673, #67661, #67543, #67500, #67247, #67246, #67205, #67202, #67149, #67091, #67080, #66493, #66485, #66438, #66435, #66419, #66414, #66407, #66403, #66366, #66351, #66331, #66323, #66259, #66208, #66175, #66171, #65826, #65716, #65688, #65678, #65661, #65632, #65552, #65395, #65292, #65215, #64942, #64928, #64921, #64125, #64063, #63961, #63949, #63582, #63397, #63370, #63265, #63259, #63158, #63141, #63067, #63004, #62842, #62707, #62594, #62578, #62462, #62421, #62369, #62317, #62105, #62085, #62016, #62000, #61822, #61776, #61767, #61666, #61652, #61647, #61387, #60657, #60516, #60452, #60409, #60401, #60400, #60364, #60356, #60353, #60347, #60025, #59646, #59574, #59506, #59459, #59358, #58908, #57784, #57651, #57467, #57412, #8409, #7929, #7853, #7558, #7418, #6971, #6653, #6482, #6298, #6224, #6222, #6195, #6126, #5898, #5829, #5691, #5317, #5307, #5300, #5299, #5297, #5262, #5256, #5254, #5247, #5232, #5231, #5222, #5221, #5213, #5195, #5194, #5191, #5186, #5184, #5183, #5182, #5170, #5132, #5130, #5129, #5123, #5120, #5119, #5108, #5107, #5104, #5099, #5098, #5090, #5085, #5084, #5079, #5075, #5070, #5069, #5068, #5067, #5066, #5063, #5059, #5046, #5042, #5040, #5033, #5032, #5031, #5030, #5025, #5019, #5018, #5011, #5008, #5003, #5000, #4998, #4996, #4990, #4976, #4975, #4973, #4972, #4967, #4964, #4963, #4962, #4956, #4954, #4944, #4920, #4815, #4621, #4495, #4493, #4489, #4487, #4465, #4462, #4460, #4459}

Yuki Okushi <jtitor@2k36.org> <huyuumi.dev@gmail.com>
Yuki Okushi <jtitor@2k36.org> <yuki.okushi@huawei.com>
Yuning Zhang <codeworm96@outlook.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuning Zhang {#57250, #3609}
codeworm96 {#2572}

@dtolnay dtolnay marked this pull request as ready for review April 4, 2022 05:04
@Mark-Simulacrum
Copy link
Member

Thanks! I think this makes sense. In the long run, it's probably not sustainable to maintain these mappings, and maybe we should be thinking about dropping thanks entirely -- but for now this seems like a solid improvement and we can fix up any unintentional inaccuracies over time as needed.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Apr 7, 2022

📌 Commit 05a467e has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 7, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Apr 7, 2022
Mailmap update

I noticed there are a lot of contributors who appear multiple times in https://thanks.rust-lang.org/rust/all-time/, which makes their "rank" on that page inaccurate. For example Nick Cameron currently appears at rank 21 with 2010 contributions and at rank 27 with 1287 contributions, because some of those are from nrc&rust-lang#8288;`@ncameron.org` and some from ncameron&rust-lang#8288;`@mozilla.com.` In reality Nick's rank would be 11 if counted correctly, which is a large difference.

Solving this in a totally automated way is tricky because it involves figuring out whether Nick is 1 person with multiple emails, or is 2 people sharing the same name.

This PR addresses a subset of the cases: only where a person has committed under multiple names using the same email. This is still not something that can be totally automated (e.g. by modifying https://github.com/rust-lang/thanks to dedup by email instead of name+email) because:

- Some emails are not necessarily unique to one contributor, such as `ubuntu@localhost`.

- It involves some judgement and mindfulness in picking the "canonical name" among the names used with a particular email. This is the name that will appear on thanks.rust-lang.org. Humans change their names sometimes and can be sensitive or picky about the use of names that are no longer preferred.

For the purpose of this PR, I've tried to stick to the following heuristics which should be unobjectionable:

- If one of the names is currently set as the display name on the contributor's GitHub profile, prefer that name.

- If one of the names is used exclusively over the others in chronologically newer pull requests, prefer the newest name.

- If one of the names has whitespace and the other doesn't (i.e. is username-like), such as `Foo Bar` vs `FooBar` or `foobar` or `foo-bar123`, but otherwise closely resemble one another, then prefer the human-like name.

- If none of the above suffice in determining a canonical name and the contributor has some other name set on their GitHub profile, use the name from the GitHub profile.

- If no name on their GitHub profile but the profile links to their personal website which unambiguously identifies their preferred name, then use that name.

I'm also thinking about how to handle cases like Nick's, but that will be a project for a different PR. Basically I'd like to be able to find cases of the same person making commits that differ in name *and* email by looking at all the commits present in pull requests opened by the same GitHub user.

<details>
<summary>script</summary>

```toml
[dependencies]
anyhow = "1.0"
git2 = "0.14"
mailmap = "0.1"
```
```rust
use anyhow::{bail, Context, Result};
use git2::{Commit, Oid, Repository};
use mailmap::{Author, Mailmap};
use std::collections::{BTreeMap as Map, BTreeSet as Set};
use std::fmt::{self, Debug};
use std::fs;
use std::path::Path;

const REPO: &str = "/git/rust";

fn main() -> Result<()> {
    let repo = Repository::open(REPO)?;
    let head_oid = repo
        .head()?
        .target()
        .context("expected head to be a direct reference")?;
    let head = repo.find_commit(head_oid)?;

    let mailmap_path = Path::new(REPO).join(".mailmap");
    let mailmap_contents = fs::read_to_string(mailmap_path)?;
    let mailmap = match Mailmap::from_string(mailmap_contents) {
        Ok(mailmap) => mailmap,
        Err(box_error) => bail!("{}", box_error),
    };

    let mut history = Set::new();
    let mut merges = Vec::new();
    let mut authors = Set::new();
    let mut emails = Map::new();
    let mut all_authors = Set::new();
    traverse_left(head, &mut history, &mut merges, &mut authors, &mailmap)?;
    while let Some((commit, i)) = merges.pop() {
        let right = commit.parents().nth(i).unwrap();
        authors.clear();
        traverse_left(right, &mut history, &mut merges, &mut authors, &mailmap)?;
        for author in &authors {
            all_authors.insert(author.clone());
            if !author.email.is_empty() {
                emails
                    .entry(author.email.clone())
                    .or_insert_with(Map::new)
                    .entry(author.name.clone())
                    .or_insert_with(Set::new);
            }
        }
        if let Some(summary) = commit.summary() {
            if let Some(pr) = parse_summary(summary)? {
                for author in &authors {
                    if !author.email.is_empty() {
                        emails
                            .get_mut(&author.email)
                            .unwrap()
                            .get_mut(&author.name)
                            .unwrap()
                            .insert(pr);
                    }
                }
            }
        }
    }

    for (email, names) in emails {
        if names.len() > 1 {
            println!("<{}>", email);
            for (name, prs) in names {
                let prs = DebugSet(prs.iter().rev());
                println!("    {} {:?}", name, prs);
            }
        }
    }

    eprintln!("{} commits", history.len());
    eprintln!("{} authors", all_authors.len());
    Ok(())
}

fn traverse_left<'repo>(
    mut commit: Commit<'repo>,
    history: &mut Set<Oid>,
    merges: &mut Vec<(Commit<'repo>, usize)>,
    authors: &mut Set<Author>,
    mailmap: &Mailmap,
) -> Result<()> {
    loop {
        let oid = commit.id();
        if !history.insert(oid) {
            return Ok(());
        }
        let author = author(mailmap, &commit);
        let is_bors = author.name == "bors" && author.email == "bors@rust-lang.org";
        if !is_bors {
            authors.insert(author);
        }
        let mut parents = commit.parents();
        let parent = match parents.next() {
            Some(parent) => parent,
            None => return Ok(()),
        };
        for i in 1..1 + parents.len() {
            merges.push((commit.clone(), i));
        }
        commit = parent;
    }
}

fn parse_summary(summary: &str) -> Result<Option<PullRequest>> {
    let mut rest = None;
    for prefix in [
        "Auto merge of #",
        "Merge pull request #",
        " Manual merge of #",
        "auto merge of #",
        "auto merge of pull req #",
        "rollup merge of #",
        "Rollup merge of #",
        "Rollup merge of  #",
        "Rollup merge of ",
        "Merge PR #",
        "Merge #",
        "Merged #",
    ] {
        if summary.starts_with(prefix) {
            rest = Some(&summary[prefix.len()..]);
            break;
        }
    }
    let rest = match rest {
        Some(rest) => rest,
        None => return Ok(None),
    };
    let end = rest.find([' ', ':']).unwrap_or(rest.len());
    let number = match rest[..end].parse::<u32>() {
        Ok(number) => number,
        Err(err) => {
            eprintln!("{}", summary);
            bail!(err);
        }
    };
    Ok(Some(PullRequest(number)))
}

fn author(mailmap: &Mailmap, commit: &Commit) -> Author {
    let signature = commit.author();
    let name = String::from_utf8_lossy(signature.name_bytes()).into_owned();
    let email = String::from_utf8_lossy(signature.email_bytes()).into_owned();
    mailmap.canonicalize(&Author { name, email })
}

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
struct PullRequest(u32);

impl Debug for PullRequest {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        write!(formatter, "#{}", self.0)
    }
}

struct DebugSet<T>(T);

impl<T> Debug for DebugSet<T>
where
    T: Iterator + Clone,
    T::Item: Debug,
{
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.debug_set().entries(self.0.clone()).finish()
    }
}
```
</details>
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Apr 8, 2022
Mailmap update

I noticed there are a lot of contributors who appear multiple times in https://thanks.rust-lang.org/rust/all-time/, which makes their "rank" on that page inaccurate. For example Nick Cameron currently appears at rank 21 with 2010 contributions and at rank 27 with 1287 contributions, because some of those are from nrc&rust-lang#8288;``@ncameron.org`` and some from ncameron&rust-lang#8288;``@mozilla.com.`` In reality Nick's rank would be 11 if counted correctly, which is a large difference.

Solving this in a totally automated way is tricky because it involves figuring out whether Nick is 1 person with multiple emails, or is 2 people sharing the same name.

This PR addresses a subset of the cases: only where a person has committed under multiple names using the same email. This is still not something that can be totally automated (e.g. by modifying https://github.com/rust-lang/thanks to dedup by email instead of name+email) because:

- Some emails are not necessarily unique to one contributor, such as `ubuntu@localhost`.

- It involves some judgement and mindfulness in picking the "canonical name" among the names used with a particular email. This is the name that will appear on thanks.rust-lang.org. Humans change their names sometimes and can be sensitive or picky about the use of names that are no longer preferred.

For the purpose of this PR, I've tried to stick to the following heuristics which should be unobjectionable:

- If one of the names is currently set as the display name on the contributor's GitHub profile, prefer that name.

- If one of the names is used exclusively over the others in chronologically newer pull requests, prefer the newest name.

- If one of the names has whitespace and the other doesn't (i.e. is username-like), such as `Foo Bar` vs `FooBar` or `foobar` or `foo-bar123`, but otherwise closely resemble one another, then prefer the human-like name.

- If none of the above suffice in determining a canonical name and the contributor has some other name set on their GitHub profile, use the name from the GitHub profile.

- If no name on their GitHub profile but the profile links to their personal website which unambiguously identifies their preferred name, then use that name.

I'm also thinking about how to handle cases like Nick's, but that will be a project for a different PR. Basically I'd like to be able to find cases of the same person making commits that differ in name *and* email by looking at all the commits present in pull requests opened by the same GitHub user.

<details>
<summary>script</summary>

```toml
[dependencies]
anyhow = "1.0"
git2 = "0.14"
mailmap = "0.1"
```
```rust
use anyhow::{bail, Context, Result};
use git2::{Commit, Oid, Repository};
use mailmap::{Author, Mailmap};
use std::collections::{BTreeMap as Map, BTreeSet as Set};
use std::fmt::{self, Debug};
use std::fs;
use std::path::Path;

const REPO: &str = "/git/rust";

fn main() -> Result<()> {
    let repo = Repository::open(REPO)?;
    let head_oid = repo
        .head()?
        .target()
        .context("expected head to be a direct reference")?;
    let head = repo.find_commit(head_oid)?;

    let mailmap_path = Path::new(REPO).join(".mailmap");
    let mailmap_contents = fs::read_to_string(mailmap_path)?;
    let mailmap = match Mailmap::from_string(mailmap_contents) {
        Ok(mailmap) => mailmap,
        Err(box_error) => bail!("{}", box_error),
    };

    let mut history = Set::new();
    let mut merges = Vec::new();
    let mut authors = Set::new();
    let mut emails = Map::new();
    let mut all_authors = Set::new();
    traverse_left(head, &mut history, &mut merges, &mut authors, &mailmap)?;
    while let Some((commit, i)) = merges.pop() {
        let right = commit.parents().nth(i).unwrap();
        authors.clear();
        traverse_left(right, &mut history, &mut merges, &mut authors, &mailmap)?;
        for author in &authors {
            all_authors.insert(author.clone());
            if !author.email.is_empty() {
                emails
                    .entry(author.email.clone())
                    .or_insert_with(Map::new)
                    .entry(author.name.clone())
                    .or_insert_with(Set::new);
            }
        }
        if let Some(summary) = commit.summary() {
            if let Some(pr) = parse_summary(summary)? {
                for author in &authors {
                    if !author.email.is_empty() {
                        emails
                            .get_mut(&author.email)
                            .unwrap()
                            .get_mut(&author.name)
                            .unwrap()
                            .insert(pr);
                    }
                }
            }
        }
    }

    for (email, names) in emails {
        if names.len() > 1 {
            println!("<{}>", email);
            for (name, prs) in names {
                let prs = DebugSet(prs.iter().rev());
                println!("    {} {:?}", name, prs);
            }
        }
    }

    eprintln!("{} commits", history.len());
    eprintln!("{} authors", all_authors.len());
    Ok(())
}

fn traverse_left<'repo>(
    mut commit: Commit<'repo>,
    history: &mut Set<Oid>,
    merges: &mut Vec<(Commit<'repo>, usize)>,
    authors: &mut Set<Author>,
    mailmap: &Mailmap,
) -> Result<()> {
    loop {
        let oid = commit.id();
        if !history.insert(oid) {
            return Ok(());
        }
        let author = author(mailmap, &commit);
        let is_bors = author.name == "bors" && author.email == "bors@rust-lang.org";
        if !is_bors {
            authors.insert(author);
        }
        let mut parents = commit.parents();
        let parent = match parents.next() {
            Some(parent) => parent,
            None => return Ok(()),
        };
        for i in 1..1 + parents.len() {
            merges.push((commit.clone(), i));
        }
        commit = parent;
    }
}

fn parse_summary(summary: &str) -> Result<Option<PullRequest>> {
    let mut rest = None;
    for prefix in [
        "Auto merge of #",
        "Merge pull request #",
        " Manual merge of #",
        "auto merge of #",
        "auto merge of pull req #",
        "rollup merge of #",
        "Rollup merge of #",
        "Rollup merge of  #",
        "Rollup merge of ",
        "Merge PR #",
        "Merge #",
        "Merged #",
    ] {
        if summary.starts_with(prefix) {
            rest = Some(&summary[prefix.len()..]);
            break;
        }
    }
    let rest = match rest {
        Some(rest) => rest,
        None => return Ok(None),
    };
    let end = rest.find([' ', ':']).unwrap_or(rest.len());
    let number = match rest[..end].parse::<u32>() {
        Ok(number) => number,
        Err(err) => {
            eprintln!("{}", summary);
            bail!(err);
        }
    };
    Ok(Some(PullRequest(number)))
}

fn author(mailmap: &Mailmap, commit: &Commit) -> Author {
    let signature = commit.author();
    let name = String::from_utf8_lossy(signature.name_bytes()).into_owned();
    let email = String::from_utf8_lossy(signature.email_bytes()).into_owned();
    mailmap.canonicalize(&Author { name, email })
}

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
struct PullRequest(u32);

impl Debug for PullRequest {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        write!(formatter, "#{}", self.0)
    }
}

struct DebugSet<T>(T);

impl<T> Debug for DebugSet<T>
where
    T: Iterator + Clone,
    T::Item: Debug,
{
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.debug_set().entries(self.0.clone()).finish()
    }
}
```
</details>
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 8, 2022
Rollup of 7 pull requests

Successful merges:

 - rust-lang#95102 (Add known-bug for rust-lang#95034)
 - rust-lang#95579 (Add `<[[T; N]]>::flatten{_mut}`)
 - rust-lang#95634 (Mailmap update)
 - rust-lang#95705 (Promote x86_64-unknown-none target to Tier 2 and distribute build artifacts)
 - rust-lang#95761 (Kickstart the inner usage of `macro_metavar_expr`)
 - rust-lang#95782 (Windows: Increase a pipe's buffer capacity to 64kb)
 - rust-lang#95791 (hide an #[allow] directive from the Arc::new_cyclic doc example)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 7be3084 into rust-lang:master Apr 8, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 8, 2022
@dtolnay dtolnay deleted the mailmap branch April 8, 2022 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants