Marking records as not a duplicate to exclude from the duplicate management tool

Bradley Henry · March 2016

I am trying to find out if there is a way to mark two records as NOT a duplicate outside of the Duplicate Management Tool so that when I run the process they do not get matched.

OK so I have a number of records that have the same email address but are not duplicate records, Siblings, Parent-child, Partners, you get the picture.

I don't actually use the dupe managment tool to process the possible dupes as it is verry time consuming. I export the CN's from it then use thoes to create an excel spredsheet that I can use formulas to match on certain criteria such as email.

The problem is that when I get a NON dupe based on a matched email I do nothing, then next time I run the duplicate management tool all of the previously checked and confirmed NON dupe record are there again.

I want to be able to tell the duplicate management tool that certain records are NOT a match and to skip them in the future.

I have acces to import/export functions as well as importomatic if that helps.

Amy Dana · March 2016

Can you use the duplicate management in RE for just those records you know are not dupes? For instance, if it flags George Smith and his son Frank, can you just find George/Frank in the duplicate list and mark them as not duplicates? That seems to be the only way I've found to have them not pop back up on the next run.

Melissa S Graves · March 2016

Amy - I think her question is about HOW one does that? Can you tell her how to do that?

Daniel Noga 2 · March 2016

Editing my post as I see I totally answered a different question from the one that was asked.

Bradley Henry · March 2016

Amy Dana:

Can you use the duplicate management in RE for just those records you know are not dupes? For instance, if it flags George Smith and his son Frank, can you just find George/Frank in the duplicate list and mark them as not duplicates? That seems to be the only way I've found to have them not pop back up on the next run.

Hi Amy, Great idea but the problem is that the Duplicate Management Tool has not always matched the specific constituents together.

IE: Three constituents A, B & C The tool has matche A with B because of similar last name and then a diferent match has matched B and C because of similar First name. A and C have the same email but have not been matched together in the DMT so I cannot mark them as not a dupe.

Bradley.

Matt Page · March 2016

I *think* there's a work around, but it's a bit slow. If you edit one of those records individually, it often comes up with a pop-up window when you try and save it, telling you it's a duplicate and giving you the option to continue with save/ move to the duplcate record / mark it as a duplicate. I believe if you do that it marks them as a duplicate in the same way that the main tool does and so it shouldn't appear when you run the main tool.

There's a challenge, obviously, of making the two records sufficeintly similar in the first place, without forgetting which is which (and it would be time consuming), but you might find with some of them that if you try and save them without even making any changes the dupe tool will pop up like this anyway.

My understanding is that somewhere in the back end of RE there is a table which has a list of pairs of records which are not duplicates, and both tools work with the same one.

Matt

Amy Dana · March 2016

Bradley Henry:

Amy Dana:

Can you use the duplicate management in RE for just those records you know are not dupes? For instance, if it flags George Smith and his son Frank, can you just find George/Frank in the duplicate list and mark them as not duplicates? That seems to be the only way I've found to have them not pop back up on the next run.

Hi Amy, Great idea but the problem is that the Duplicate Management Tool has not always matched the specific constituents together.

IE: Three constituents A, B & C The tool has matche A with B because of similar last name and then a diferent match has matched B and C because of similar First name. A and C have the same email but have not been matched together in the DMT so I cannot mark them as not a dupe.

Bradley.

Ugh. I guess I assumed that the duplication tool would still find them. Witchcraft, maybe?

Jen Claudy · March 2016

Ironically, I was just about to go in and try out this tool when I saw this thread. Used the count to finish up a proposal for data scrubbing that has now just been shot down as having no ROI. I would love to know if I was even close on my estimate of how long it would take to review and resolve the 4122 potential duplicates in our database, but it appears that we won't be doing any of that work now.

Probably won't help with regard to the DMT in RE, but I have a Relationship Type of "POSSIBLE DUPLICATE" and one of "DUPLICATE RECORD" so that I can link records together as I come across them and then someday come back (when I have time!) and resolve them. It's also come in helpful when I see that Relationship Link as I'm doing other work...and I have a Business Rule for this situation.

Bradley Henry · March 2016

Matthew Page:

I *think* there's a work around, but it's a bit slow. If you edit one of those records individually, it often comes up with a pop-up window when you try and save it, telling you it's a duplicate and giving you the option to continue with save/ move to the duplcate record / mark it as a duplicate. I believe if you do that it marks them as a duplicate in the same way that the main tool does and so it shouldn't appear when you run the main tool.

There's a challenge, obviously, of making the two records sufficeintly similar in the first place, without forgetting which is which (and it would be time consuming), but you might find with some of them that if you try and save them without even making any changes the dupe tool will pop up like this anyway.

My understanding is that somewhere in the back end of RE there is a table which has a list of pairs of records which are not duplicates, and both tools work with the same one.

Matt

Hi Matt,

I guess the biggest issue is that what if you get distrated and forget to change the record back. One idea that I am considering is to create an atribute where I can put the CN of the NOT a dupe record. this way when I do my export I can include this attribute then delete those records straight away. I am thinking this is the way I may have to do it.

Bradley.

Amy Dana · March 2016

Jen Claudy:

Ironically, I was just about to go in and try out this tool when I saw this thread. Used the count to finish up a proposal for data scrubbing that has now just been shot down as having no ROI. I would love to know if I was even close on my estimate of how long it would take to review and resolve the 4122 potential duplicates in our database, but it appears that we won't be doing any of that work now.

Probably won't help with regard to the DMT in RE, but I have a Relationship Type of "POSSIBLE DUPLICATE" and one of "DUPLICATE RECORD" so that I can link records together as I come across them and then someday come back (when I have time!) and resolve them. It's also come in helpful when I see that Relationship Link as I'm doing other work...and I have a Business Rule for this situation.

How do they determine ROI on data cleanup???

Faith Murray · March 2016

Jen Claudy:

Ironically, I was just about to go in and try out this tool when I saw this thread. Used the count to finish up a proposal for data scrubbing that has now just been shot down as having no ROI. I would love to know if I was even close on my estimate of how long it would take to review and resolve the 4122 potential duplicates in our database, but it appears that we won't be doing any of that work now.

Probably won't help with regard to the DMT in RE, but I have a Relationship Type of "POSSIBLE DUPLICATE" and one of "DUPLICATE RECORD" so that I can link records together as I come across them and then someday come back (when I have time!) and resolve them. It's also come in helpful when I see that Relationship Link as I'm doing other work...and I have a Business Rule for this situation.

Maybe I am totally misunderstanding, but wouldn't it still be worthwhile to go through the 4122 potential duplicates in the Duplicate Management Tool anyway? Bradley's complaint, as I understand it, was not that the Duplicate Management Tool doesn't work, but that it is time-consuming so he isn't marking them non-dupes within the Tool itself, so they keep showing back up. He is looking for a way to mark them non-dupes en masse. I agree that the Tool takes time, but once you take the time to go through it the first time, the maintenance is minimal. Personally, I find the Tool a better option than searching in Excel, because Excel can't search on fuzzy data very well. On one particular import project, we hired a firm to do an Excel purge for dupes between our House list and an import file of new constituents. The Excel purge caught most of the dupes, but several hundred still remained when I began looking through them individually. Misspelled names, moved addresses, and changed emails caused them to be missed in Excel's search criteria. The advantage of the RE Dupe Tool is that it uses fuzzy data to compare potential dupes. Thus a donor named Claeys is flagged as a potential dupe to another fellow named Clarys.

Also yes, it's possible the Tool may only pick up 75% of the possible dupes in your system, but once you go through them and merge them/mark them "not duplicates", that's still several thousand less dupes in your system who are potentially receiving duplicate mailings and wasting your organization money in postage.

Jen Claudy · March 2016

Amy Dana:

Jen Claudy:

Ironically, I was just about to go in and try out this tool when I saw this thread. Used the count to finish up a proposal for data scrubbing that has now just been shot down as having no ROI. I would love to know if I was even close on my estimate of how long it would take to review and resolve the 4122 potential duplicates in our database, but it appears that we won't be doing any of that work now.

Probably won't help with regard to the DMT in RE, but I have a Relationship Type of "POSSIBLE DUPLICATE" and one of "DUPLICATE RECORD" so that I can link records together as I come across them and then someday come back (when I have time!) and resolve them. It's also come in helpful when I see that Relationship Link as I'm doing other work...and I have a Business Rule for this situation.

How do they determine ROI on data cleanup???

I don't know...I can't follow the logic, and there are additional conversations happening without me, so I'm more or less just getting the end-result decision. My original proposal had a list of 10 high-level categories of cleanup, and only 3 made it past the first consideration. Sigh.

Lance Dudek 2 · March 2016

Jen Claudy:

Ironically, I was just about to go in and try out this tool when I saw this thread. Used the count to finish up a proposal for data scrubbing that has now just been shot down as having no ROI. I would love to know if I was even close on my estimate of how long it would take to review and resolve the 4122 potential duplicates in our database, but it appears that we won't be doing any of that work now.

Probably won't help with regard to the DMT in RE, but I have a Relationship Type of "POSSIBLE DUPLICATE" and one of "DUPLICATE RECORD" so that I can link records together as I come across them and then someday come back (when I have time!) and resolve them. It's also come in helpful when I see that Relationship Link as I'm doing other work...and I have a Business Rule for this situation.

Jen,

If it is any comfort, the duplicate management tool counts both records as duplicates, so that when you fix one, the duplicate count drops by two, so you really only have 2000 duplicates.

Jennifer Johnson · March 2016

We use the duplicate management tool in RE and try to keep the count to zero but I feel that it doesn't get everyone. So once a month I export all the individuals in the system and look for duplicates by FN, LN, Add1 or LN, Add1. Once I find the duplicates or household duplicates, I import a solicit code into these records so that I could find them in RE. Then I clean out the duplicates and if I find that they are not duplicates (family or spouses) I mark their one of the records with an attribute. Here is what we use:

Is NOT Head of Household/Should NOT receive direct mail
Not a duplicate household

Then when I go to pull the export again next month I exclude the individuals with these attributes.

Seems like a lot but this is how we were able to get the duplicate count down to zero and the cost of our mailing down.

Faith Murray · March 2016

Jen Claudy:

I don't know...I can't follow the logic, and there are additional conversations happening without me, so I'm more or less just getting the end-result decision. My original proposal had a list of 10 high-level categories of cleanup, and only 3 made it past the first consideration. Sigh.

I feel for you! At our org, I pretty much get to do whatever cleanup I want in the system, as long as I can schedule my own time for it. Would hate to have the level of red tape it sounds like you have to go through.

Jen Claudy · March 2016

F Murray:

Jen Claudy:

I don't know...I can't follow the logic, and there are additional conversations happening without me, so I'm more or less just getting the end-result decision. My original proposal had a list of 10 high-level categories of cleanup, and only 3 made it past the first consideration. Sigh.

I feel for you! At our org, I pretty much get to do whatever cleanup I want in the system, as long as I can schedule my own time for it. Would hate to have the level of red tape it sounds like you have to go through.

Oh, I'm more than welcome to do any scrubbing I want...so long as it doesn't take time away from any of my other work, which means from home on my own time. (I'm salaried, so no overtime.) The from home is great, the on my own time is not. I am no longer willing to put in personal time (or at least not this much) on work projects unless there's a really good ROI for me, personally. Like time spent in the Community. A lot of that time is during the work day, when more people are actively posting, and then I spend personal time catching up on work projects (many of which are done more efficiently from home where there aren't constant interruptions).

Matt Page · March 2016

I find the problem is working out whether they are dupliates or not. A lot of our constituents only have initials not first names and so often we have someone on the system with first name and someone else with the same initial. And then there's info on both constituents so they need merging. Getting through 4000 of these will be a major hassle.

Matt

Nicole S · March 2016

I agree -- sometimes even figuring out if you've got a dupe or not is hard enough. And then trying to figure out which is the cleaner/more correct record when you've got different addresses or spellings.

DO NOT under any circumstance spend your personal time scrubbing data or anything else for work. You will only grow to resent the job. If it needs to get done, do it on the clock or not at all. I know that sounds harsh, but thems the breaks, as they say. Sounds like a catch-22. You need clean data to do your job, but you can't clean the data because you're doing your job... I have found that there is typically a slow time of the year when I can work on scrubbing. And even then it doesn't all get done. It's a never ending project, which is another reason why you should't do it in your personal time. It's not like you're going to spend a few hours and BOOM you're good to go. There is always more to do. Doing work on personal time gives the illusion that it's possible to get it all done. Your personal life will suffer. Before you know it, you will be a sad, downtrodden, put upon DBA that the rest of the office will dump more and more work on.

Tracy Morgan · May 2016

For Matthew: we have gotten our hands on LexisNexis to cross reference names, addresses and sometimes ages with our costituents having either common names (i.e. John Pederson) or just initials (M. Johnson). The address usually allows us to identify the individual and fill in the lack of details and LexisNexis has home addresses from years prior. This also gets more hits using the duplicate management tool! We also find deceased constituents this way...

Jen Claudy · May 2016

Tracy Morgan:

For Matthew: we have gotten our hands on LexisNexis to cross reference names, addresses and sometimes ages with our costituents having either common names (i.e. John Pederson) or just initials (M. Johnson). The address usually allows us to identify the individual and fill in the lack of details and LexisNexis has home addresses from years prior. This also gets more hits using the duplicate management tool! We also find deceased constituents this way...

We currently have a Lexis-Nexis subscription, but are going to need to drop it because of cost. Blackbaud has told my boss that their tools (like Email Finder, Phone Finder, Deceased Record Finder, Age Finder) can completely replace any need for L-N, but I'm not thinking that's exactly true. I do a lot of one-off searches looking at potential relatives, etc. But it's not worth $7k+ a year, so I'll have to figure out where else to get at least some of that data...

Anne Boudreau · May 2016

We use Lexis/Nexis (Accurint) but don't pay for a yearly subscription. We pay per search -- it's about 50 cents for an advanced person search, 40 cents for an email search, $1 for a relative search. Don't have the price for a deceased person search off the top of my head, but this has been an incredibly inexpensive way for us to get the info we need.

Jeffrey Montgomery · May 2016

Hi Folks, if you're struggling with dupes in RE -please- try the 30-day free trial of MergeOmatic (or at least watch the six minute video here). To answer OP's question, you can easily mark records as not dupes (as well as see them all/change some back again if you later decide they are dupes). It also does a much better job of finding/ranking/displaying possible duplicates than anything else. If you like, one employee (or volunteer) can review the dupes and queue them for merging, and then a supervisor can review the queue before merging in batch. It also creates a confidence score for the matches, so you can sort and filter to work with the "low-hanging fruit" first. Finally, it archives practically all the data from the merged record so you could go recreate it if you ever had to (and you'll have a full record of who was merged into who when). We spent years of developer time working on this solution, if you're glad to see someone working on this pervasive problem, please click "good answer" above. I look forward to hearing your feedback!

Jeff

c0defe11c094e89d7902058eb8bb1c3a-huge-me

550a09cd7597d3b098e8f3b37a2a3113-huge-du

Matt Page · May 2016

Hmmm the free trial does sound tempting. Thanks.

Matt

Matt Page · May 2016

I just received a copy of the Blackbaud Update email and it mentions KB article 52635 which has helped me work out another option for duplicates.

Basically in Business Rules, under "Duplicates" you can tick:

"When displaying constituent or duplicate search results

[] Display only constituents that the user has rights to view"

Which means that if you use Constituent Code to limit the range of constituents that a user can access, and then get them to run the "Duplicate Constituent Management Tool", it will only search for duplicates amongst those constituents.

So if you want to only search among a group of constituents, you just have to temporarily assign them a temporary constituent code, limit a user's access to only those with taht code (and you could also make a special user account for this purpose) and then get them to run the Dupe Cons. Management Tool.

I think this might be quite useful for us. Hope it helps some of you as well. It's another tool for the kit anyway...

Matt

Jo Ward · May 2016

I have created a Constituent attribute called Duplicate Check. It has a table associated with it, but only one currently used table entry: Not a Dup - Skip. I create an output query from the duplicate report and add the criteria that this attribute is missing from the records. It successfully eliminates those records that I have already confirmed as non-duplicates. I then proceed to export additional information for the remaining constituents and examine in Excel.

Marking records as not a duplicate to exclude from the duplicate management tool

Comments

Categories