1 year ago
#337749
KEEP BUSY
Cross referencing data and removing entire lines that contain same string with Python
Cutting to the chase here...
I have 2 sets of data, both in .txt files. We will call them DATA A and DATA B.
I am currently collecting information from other students for a mailing list / club application as well as gathering a bit more data from them. This data goes into DATA A. DATA A currently looks like the following example:
exampleemail@email.com | How many years in college: 3 | Address: 123 Example Blvd.
exampleemail2@email.com | How many years in college: 1 | Address: 444 Example Blvd.
exampleemail3@email.com | How many years in college: 2 | Address: 567 Example Blvd.
However, when people sign up at the auditorium and aren't monitored, they tend to only leave their email and leave out the certain pieces of information such as the following examples:
examplemail1@email.com | N/A | N/A
examplemail2@email.com | How many years in college: 1 | N/A
examplemail3@email.com | N/A | 111 Example Blvd.
When compiling both sets of data, I need to make sure that the semi-completed lines of data are (when both files are run with the script) still present on DATA A, but are in turn REMOVED FROM DATA B as to leave ONLY FRESH, NOT ALREADY PRESENT data on either DATA B or an OUTPUT file with ONLY THE EMAILS, not the blank data, so I can email them to ask for it...
Here is an example:
EXAMPLE
DATA A:
exampleemail@email.com | How many years in college: 3 | Address: 123 Example Blvd.
exampleemail2@email.com | How many years in college: 1 | Address: 444 Example Blvd.
exampleemail3@email.com | How many years in college: 2 | Address: 567 Example Blvd.
exampleemail4@email.com | How many years in college: 2 | Address: 888 Example Blvd.
DATA B:
exampleemail3@email.com | N/A | N/A
exampleemail1@email.com | How many years in college: 3 | N/A
exampleemail8@email.com | N/A | 888 Example Blvd.
examplemail12@email.com | N/A | N/A
+++ SCRIPT IS RUN AT THIS POINT +++ (Any COMMON data from DATA B is removed from DATA B OR an output file with only FRESH data is created)...
OUTPUT:
exampleemail8@email.com
examplemail12@email.com
Would love to know how to do this in Python - thanks!
python
database
removing-whitespace
cross-reference
0 Answers
Your Answer