BuiltWith: add API-based scraper (no Selenium)#243
Draft
Adaakal wants to merge 5 commits intohackforla:mainfrom
Draft
BuiltWith: add API-based scraper (no Selenium)#243Adaakal wants to merge 5 commits intohackforla:mainfrom
Adaakal wants to merge 5 commits intohackforla:mainfrom
Conversation
Member
Author
|
Update (Oct 20, 2025): HTML heuristic refresh Re-ran widget_probe_min.py (no API) across all 97 NC sites. Current widget counts: has_calendar: 63 Spot checks (first few domains per category):
Notes:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status (WIP)
New script 311-data/webscraping/builtwith_api_scrape.py (BuiltWith/RapidAPI path).
Reads URLs from the wide NCsurvey.csv (“NC URL (if avail)” row) and iterates all 97 domains.
Current stop point: completes the lookups but crashes when building the CSVs → KeyError: ['technology'].
Why: Free BuiltWith/RapidAPI response doesn’t always include technologies in the fields our parser expected.
Next steps:
Make parser tolerant of multiple response shapes.
Guard output step so it writes empty CSVs when no rows are returned.
(If org has a full BuiltWith API key, run again to populate tech tables.)