1 year ago

#337749

test-img

KEEP BUSY

Cross referencing data and removing entire lines that contain same string with Python

Cutting to the chase here...

I have 2 sets of data, both in .txt files. We will call them DATA A and DATA B.

I am currently collecting information from other students for a mailing list / club application as well as gathering a bit more data from them. This data goes into DATA A. DATA A currently looks like the following example:

exampleemail@email.com | How many years in college: 3 | Address: 123 Example Blvd.
exampleemail2@email.com | How many years in college: 1 | Address: 444 Example Blvd.
exampleemail3@email.com | How many years in college: 2 | Address: 567 Example Blvd.

However, when people sign up at the auditorium and aren't monitored, they tend to only leave their email and leave out the certain pieces of information such as the following examples:

examplemail1@email.com | N/A | N/A
examplemail2@email.com | How many years in college: 1 | N/A
examplemail3@email.com | N/A | 111 Example Blvd.

When compiling both sets of data, I need to make sure that the semi-completed lines of data are (when both files are run with the script) still present on DATA A, but are in turn REMOVED FROM DATA B as to leave ONLY FRESH, NOT ALREADY PRESENT data on either DATA B or an OUTPUT file with ONLY THE EMAILS, not the blank data, so I can email them to ask for it...

Here is an example:

EXAMPLE

DATA A:

exampleemail@email.com | How many years in college: 3 | Address: 123 Example Blvd.
exampleemail2@email.com | How many years in college: 1 | Address: 444 Example Blvd.
exampleemail3@email.com | How many years in college: 2 | Address: 567 Example Blvd.
exampleemail4@email.com | How many years in college: 2 | Address: 888 Example Blvd.

DATA B:

exampleemail3@email.com | N/A | N/A
exampleemail1@email.com | How many years in college: 3 | N/A
exampleemail8@email.com | N/A | 888 Example Blvd.
examplemail12@email.com | N/A | N/A

+++ SCRIPT IS RUN AT THIS POINT +++ (Any COMMON data from DATA B is removed from DATA B OR an output file with only FRESH data is created)...

OUTPUT:

exampleemail8@email.com
examplemail12@email.com

Would love to know how to do this in Python - thanks!

python

database

removing-whitespace

cross-reference

0 Answers

Your Answer

Accepted video resources