We’d venture to say that this tip falls under the “power user” umbrella since at first it may present itself as a bit more daunting than it really is to the casual computer user. Nevertheless, it’s a question that presented itself and the answer is one that will get you thinking in a manner that leads you through the doorway of efficiency, and so we are dutifully obliged to answer.
I have a text file wherein each line contains a city, state and zip code in the common format, “name of city, state zip”. I need to get this data into a spreadsheet so that I can sort it in a number of ways. The file is quite large though - is there a faster way of extracting this data than cut & paste?
There is a solution that was built just for this very type of situation, and that solution’s name is a regular expression. A regular expression is a tool you can use to search for patterns within text. In order to accomplish this task you are going to need a text editor that supports regular expressions, and Microsoft Notepad does not. The text editor we are going to employ to illustrate this use of regular expressions is called TextPad, and can be downloaded for free.
Overview
Here is an overview of how we plan to go about splitting up this data and importing it into a spreadsheet:
- Combine TextPad’s search & replace feature with a regular expression to transform the data into CSV format.
- CSV stands for comma separated values and is a file format that Excel (and any other spreadsheet program) will understand. We’ll explain more later.
- Import the CSV data from your text file into an Excel spreadsheet.
Ensure your text editor is configured properly
Open up the city/state/zip code file and ensure that your text editor is configured to interpret POSIX regular expressions. To do this in TextPad, click on Configure, click on Preferences and in the section titled Editor, make sure that there is a checkbox next to POSIX.
Combine search & replace with a regular expression to transform your data into CSV format
CSV stands for Comma Separated Values and is a file format that Excel (and any other spreadsheet program) will understand. The rules of CSV are as follows:
- Enclose the contents of each field in double quotation marks.
- Separate each field with a comma.
- Put each row on a separate line.
When the data inside your text file is in CSV format, Excel will be able to separate all of the rows and columns by examining the placement of your commas, double quotes and line breaks.
Now that you know what CSV is, how it works and why we’re using it, it’s time to transform your data into CSV. To do this you are going to use TextPad’s search & replace feature to search for a regular expression pattern and then inject quotes and commas into each line at the correct positions. Right now, your data should look like this:
New York, NY 10001
New York, NY 10002
New York, NY 10003
New York, NY 10004
New York, NY 10005
- Click Search, then click Search and Replace. A new window will open.
- Check off the box next to Regular Expression.
- Search for: ^([^,]+),\s*([^\s]+)\s+([0-9-]+)$
- Replace with (including quotes): “\1″,”\2″,”\3″
- Click Replace All, and then click Ok.
Your data will now look like this:
“New York”,”New York”,”10001″
“New York”,”New York”,”10002″
“New York”,”New York”,”10003″
“New York”,”New York”,”10004″
“New York”,”New York”,”10005″
Save your changes and close TextPad.
Import the CSV data from your text file into an Excel spreadsheet
Now you are going to import this textual CSV formatted data into the fields of a spreadsheet. Open up Excel, click on Data and move the mouse down to Get External Data. A new menu will slide out; click on Import Text File. When you click on “Import Text File”, a new window will pop up and ask you to open a file. Locate the text file containing your CSV data, select it and click on Open. After you press the “Open” button you will see the following window:

Make sure that delimited is checked off and click the Next button. You will see this:

Make sure that no option besides Comma is checked off and that the Text Qualifier drop-down box is set to ” (double quote), then click the Next button and then Finish. You will see this:

This dialog box is asking you where the data should be inserted. In this case, we are inserting the data into the current worksheet and starting at row 1, cell 1 (=$A$1). Click ok and your data will be imported into the spreadsheet, and you should see a final product that resembles this:

This may seem like a complex series of cryptic steps just to get some data into a spreadsheet, but we assure you that the end results, especially when you’re dealing with a very large amount of data, are thoroughly pleasing. Regular expressions are a powerful tool that, in this case, will accomplish in a fraction of a second what would take you hours to copy & paste (assuming of course that you are using more than the five zip codes we employed here for demonstrative purposes). After you understand the subject matter of this task, the most demanding part of the process is clicking “next next next next next”. We would like to go more in depth into regular expressions, but that’s beyond the scope of this article. We do however urge you to go find out more for yourself; the Wikipedia article on Regular Expressions is a good starting point if you want to learn more.



