Yesterday I wrote about trying to automate more of my manual process of trying to find suitable backlinks to my blog in my post More SEMRush.
Today, I was back at the manual process, and the first thing I noticed was that I really need some kind of tracking. When I go to a blog and look around, often I will see where they list other blogs they follow. I look at these like leads.
I’ve been just copying them and pasting them into a Microsoft Word document. I’d like to automate the process of cleaning that up.
So I opened VS Code created a script called extract_urls.py. I used CTRL + ~ to open a terminal. I ran extract_urls.py from that terminal, and it just sat there.
I am new to this editor. I am not used to python scripts just sitting there with no output unless it is an infinite loop or something.
This is not an output-free infinite loop.

See that blue circle with the numeral “1”? Looks like that is my problem.
Clicking on that shows that I couldn’t import the docx module.

I actually wish it would just display the error to the terminal like I am used to. I guess it just takes some getting used to.
After trying again—it actually behaved like I normally expect and threw an exception to the screen. I’m not even sure how I got that to happen.

Anyhow. The fix is clear. I need to install the docx module. But I always like to make a copy of my conda env berfore installing any new packages, which raises the question… which environment am I even running?
I am so used to seeing the prompt tell you the env, but not in this terminal. I guess I can find out with “conda info –envs”
That actually lists all your environments, but places and asterisk next to the currently activated one.
Ok. Now that I know that I am using my usual “best_env”, I will clone that before installing docx.
conda create –name best_env_todays_date –clone best_env
BTW, I use “best_env” to just mean one that I have installed a bunch of stuff on, and so far it works for everything I want to do.
Now “conda install docx” produced a PackagesNotFoundError, so I guess we need to try “pip install docx”
That worked I guess, but looks like it installed a lot:

Try running the script again. This time I get ModuleNotFoundError: No module named ‘exceptions’.
Hmmm… I didn’t even use the exceptions module, but it looks like docx does. I guess I have to install that now as well.
No luck with “conda install exceptions” or “pip install exceptions”. Found this stackoverflow post python – Unable to pip install exceptions Package – Stack Overflow
I guess try “pip install pyceptions” That didn’t work either.
Found this stackoverflow post python – When import docx in python3.3 I have error ImportError: No module named ‘exceptions’ – Stack Overflow
Apparently I should never have pip installed docx. For python3, I should have done “pip install python-docx”
Ok. Let’s try:
“pip uninstall docx”
“pip install python-docx”
Well… That was progress. I now have a new error:
docx.opc.exceptions.PackageNotFoundError: Package not found at ‘leads.docx’
I could keep going, but I want to be practical. Is this really saving me time?
It just seems like trying to make this docx module work may not be worth it. Can’t I just paste into some kind of HTML editor instead?
So I tried just CTRL – A followed by CTRL – C in the Microsoft Word doc, then start a new post in WordPress. In the body just CTRL – V. Save as a draft and preview. Then right-click on the preview and view document source. Save as “leads.html”
Then I changed my code to extract from an HTML file instead of a Microsoft Word document.
Now I get to use Beautiful Soup, which I already have installed.
This worked so much faster.
I now have all my lead URLs in a text file. I suppose it would be better to put them into a CSV. But at least I have the beginnings of a system for my leads.
Leave a Reply