Парсинг номера телефона python

Содержание

phonenumbers 8.13.17
Навигация
Ссылки проекта
Статистика
Метаданные
Сопровождающие
Классификаторы
Описание проекта
phonenumbers Python Library
Installation
Example Usage
Memory Usage
Static Typing
Project Layout
Скрапинг Avito без headless-браузера
Saved searches
Use saved searches to filter your results more quickly
gil9red/parser-phone-image
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About

phonenumbers 8.13.17

Python version of Google’s common library for parsing, formatting, storing and validating international phone numbers.

Ссылки проекта

Статистика

Метаданные

Лицензия: Apache Software License (Apache License 2.0)

Сопровождающие

Классификаторы

Development Status

5 — Production/Stable

Developers

OSI Approved :: Apache Software License

OS Independent

Python :: 2
Python :: 2.5
Python :: 2.6
Python :: 2.7
Python :: 3
Python :: 3.3
Python :: 3.4
Python :: 3.5
Python :: 3.6
Python :: 3.7
Python :: 3.8
Python :: 3.9
Python :: 3.10
Python :: 3.11
Python :: Implementation :: CPython
Python :: Implementation :: PyPy

Communications :: Telephony

Описание проекта

phonenumbers Python Library

This is a Python port of Google’s libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase, with no 2to3 conversion needed).

Installation

Example Usage

The main object that the library deals with is a PhoneNumber object. You can create this from a string representing a phone number using the parse function, but you also need to specify the country that the phone number is being dialled from (unless the number is in E.164 format, which is globally unique).

The PhoneNumber object that parse produces typically still needs to be validated, to check whether it’s a possible number (e.g. it has the right number of digits) or a valid number (e.g. it’s in an assigned exchange).

      The parse function will also fail completely (with a NumberParseException ) on inputs that cannot be uniquely parsed, or that can't possibly be phone numbers.

Once you’ve got a phone number, a common task is to format it in a standardized format. There are a few formats available (under PhoneNumberFormat ), and the format_number function does the formatting.

   If your application has a UI that allows the user to type in a phone number, it's nice to get the formatting applied as the user types. The AsYouTypeFormatter object allows this.

Sometimes, you’ve got a larger block of text that may or may not have some phone numbers inside it. For this, the PhoneNumberMatcher object provides the relevant functionality; you can iterate over it to retrieve a sequence of PhoneNumberMatch objects. Each of these match objects holds a PhoneNumber object together with information about where the match occurred in the original string.

    You might want to get some information about the location that corresponds to a phone number. The geocoder.area_description_for_number does this, when possible.

For more information about the other functionality available from the library, look in the unit tests or in the original libphonenumber project.

Memory Usage
The library includes a lot of metadata, potentially giving a significant memory overhead. There are two mechanisms for dealing with this.

The normal metadata (just over 2 MiB of generated Python code) for the core functionality of the library is loaded on-demand, on a region-by-region basis (i.e. the metadata for a region is only loaded on the first time it is needed).
Metadata for extended functionality is held in separate packages, which therefore need to be explicitly loaded separately. This affects:

The geocoding metadata (~19 MiB), which is held in phonenumbers.geocoder and used by the geocoding functions ( geocoder.description_for_number , geocoder.description_for_valid_number or geocoder.country_name_for_number ).
The carrier metadata (~1 MiB), which is held in phonenumbers.carrier and used by the mapping functions ( carrier.name_for_number or carrier.name_for_valid_number ).
The timezone metadata (~100 KiB), which is held in phonenumbers.timezone and used by the timezone functions ( time_zones_for_number or time_zones_for_geographical_number ).

The phonenumberslite version of the library does not include the geocoder, carrier and timezone packages, which can be useful if you have problems installing the main phonenumbers library due to space/memory limitations.
If you need to ensure that the metadata memory use is accounted for at start of day (i.e. that a subsequent on-demand load of metadata will not cause a pause or memory exhaustion):

Force-load the normal metadata by calling phonenumbers.PhoneMetadata.load_all() .
Force-load the extended metadata by import ing the appropriate packages ( phonenumbers.geocoder , phonenumbers.carrier , phonenumbers.timezone ).

The phonenumberslite version of the package does not include the geocoding, carrier and timezone metadata, which can be useful if you have problems installing the main phonenumbers package due to space/memory limitations.
Static Typing
The library includes a set of type stub files to support static type checking by library users. These stub files signal the types that should be used, and may also be of use in IDEs which have integrated type checking functionalities.
These files are written for Python 3, and as such type checking the library with these stubs on Python 2.5-2.7 is unsupported.
Project Layout

The python/ directory holds the Python code.
The resources/ directory is a copy of the resources/ directory from libphonenumber. This is not needed to run the Python code, but is needed when upstream changes to the master metadata need to be incorporated.
The tools/ directory holds the tools that are used to process upstream changes to the master metadata.

Источник
Скрапинг Avito без headless-браузера
Недавно на хабре вышла статья Скрапинг современных веб-сайтов без headless-браузеров, и в комментариях было высказано мнение, что без headless-браузера не выйдет получить номер телефона из объявления на "авито" или "юле". Хочу это опровергнуть, ниже скрипт на python размером менее 100 строк кода, который успешно парсит "авито"
Я не являюсь специалистом по "парсингу" сайтов и это не моя работа, но не редки случаи, когда для решения моих рабочих, и не только задач, приходится это делать. Например необходимо получить баланс лицевого счета в каком-то сервисе(мобильные операторы), который не имеет для этого API или, что совсем печально, список доменов у регистратора (ещё один), который так-же не имеет API.
Как и в статье, пара комментариев из которой побудили меня написать этот пост, я тоже использую Python и библиотеку requests. Если не удается найти "внутренний" API , то приходится подключать библиотеку BeautifulSoup. Но тут всё оказалось намного проще.
Если открыть "полную" версию сайта https://avito.ru, и попытаться скопировать номер телефона, то станет понятно, что номер телефона на сайте не написан, а нарисован. Но в мобильной версии сайта, номер отдается текстом. Это можно проверить, если в инструментах разработчика в браузере посмотреть ответы при нажатии на кнопку "Позвонить".
Я не буду детально разбирать свой скрипт, в коде достаточно комментариев, чтоб понять что и на каком этапе происходит. Если кратко, то используется мобильная версия сайта, объявляются переменные для поиска по сайту, а так-же две переменные "key" и "cookie", о них далее подробнее, потом идет процесс получения куки путем открытия главной страницы, далее запускается цикл, которые собирает id всех объявлений проходя по всем страницам. После того, как получены все объявления вторым циклом проходим по ним и получаем интересующую нас информацию.

Всё так легко выглядит, т.к. были найдены нужные API. По сути данный скрипт похож на такой-же будь в нем применены официальные API. Я старался не добавлять функции и не проверять ответы на корректность или обрабатывать исключения, это ведь демонстрация метода, а не боевой инструмент. На мой взгляд так понятнее. Хотя несколько проверок и обработок там всё-же есть. Так-же я старался уместить скрипт в 100 строк кода.
По поводу переменных "key" и "cookie", key как я понял статичен, он легко гуглится, то-есть не генерируется на ходу. cookie же я использовал, как простой "антиблок", как оказалось вердикт, что мой IP заблокирован на самом деле не является правдой, достаточно подсунуть свежие куки и "парсинг" продолжается.
Если будет интересно, я подробнее расскажу, как я искал API или могу подобный пример написать и для "юла".
Источник
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Пример разбора номера телефона на картинке (#python #python3 #phone #captcha #PIL #image_processing)
gil9red/parser-phone-image
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Пример разбора номера телефона на картинке (#python #python3 #phone #captcha #PIL #image_processing)
C:\Python34\python.exe C:/Users/ipetrash/Projects/parser-phone-image/main.py "89068547979" -- examples\1.png "89128064155" -- examples\2.png "89227296192" -- examples\3.png "89090999317" -- examples\4.png "89615750404" -- examples\5.png . "89323034541" -- examples\35.png "89681188182" -- examples\36.png Process finished with exit code 0

Файл
Капча
Результат парсинга

examples/1.png

89068547979

examples/2.png

89128064155

examples/3.png

89227296192

examples/4.png

89090999317

examples/5.png

89615750404

examples/6.png

89681211551

examples/7.png

89030899497

examples/8.png

89222321270

examples/9.png

89292740993

examples/10.png

89090996171

Мои репозитории с парсерами капчи:

About
Пример разбора номера телефона на картинке (#python #python3 #phone #captcha #PIL #image_processing)
Источник
Читайте также: Export java home linux

Парсинг номера телефона python

phonenumbers 8.13.17

Навигация

Ссылки проекта

Статистика

Метаданные

Сопровождающие

Классификаторы

Описание проекта

phonenumbers Python Library

Installation

Example Usage

Memory Usage

Static Typing

Project Layout

Скрапинг Avito без headless-браузера

Saved searches

Use saved searches to filter your results more quickly

gil9red/parser-phone-image

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About