Php строка адреса url

rawurlencode

Возвращает строку, в которой все не цифро-буквенные символы, кроме -_.~ , должны быть заменены знаком процента ( % ), за которым следует два шестнадцатеричных числа. Это кодирование, описанное в » RFC 3986, служит для защиты буквенных символов от интерпретации в качестве специальных разграничителей URL и защищает URL от искажения при передаче символов с последующей конвертацией (как в некоторых почтовых системах).

Примеры

Пример #1 Пример использования rawurlencode для включение пароля в URL FTP

Результат выполнения данного примера:

Или, если вы передаёте информацию как часть URL:

Пример #2 Пример использования rawurlencode()

Результат выполнения данного примера:

Смотрите также

  • rawurldecode() — Декодирование URL-кодированной строки
  • urldecode() — Декодирование URL-кодированной строки
  • urlencode() — URL-кодирование строки
  • » RFC 3986

User Contributed Notes 25 notes

You can encode paths using:

$encoded = implode ( «/» , array_map ( «rawurlencode» , explode ( «/» , $path )));
?>

I’ve written a simple function to convert an UTF-8 string to URL encoded string. All the given characters are converted!

The function:
function mb_rawurlencode ( $url ) $encoded = » ;
$length = mb_strlen ( $url );
for( $i = 0 ; $i < $length ; $i ++)$encoded .= '%' . wordwrap ( bin2hex ( mb_substr ( $url , $i , 1 )), 2 , '%' , true );
>
return $encoded ;
>
?>

Example:
echo ‘http://example.com/’ ,
mb_rawurlencode ( ‘你好’ );
?>

The above example will output:
http://example.com/%e4%bd%a0%e5%a5%bd

rawurlencode() MUST not be used on unparsed URLs.

rawurlencode() should not be used on host and domain name parts (that may include international characters encoded in each domain part with a «q—» prefix followed by a special encoding of the international domain, currently in testbed).

rawurlencode() may be used on usernames and passwords separately (so that it won’t encode the ‘:’ and ‘@’ separators).

rawurlencode() must not be used on paths (that may contain ‘/’ separators): the [‘path’] element of a parsed URL must first be exploded into individual «directory» names. A directory or filename that contains a space must not be encoded with urlencode() but with this rawurlencode(), so that it will appear as a ‘%20’ hex sequence (not ‘+’)

rawurlencode() must not be used to encode the [‘query’] element of a parsed URL. Instead you must use the urlencode() function:

Typical queries often use the ‘&’ separator between each parameter. This ‘&’ separator however is just a convention, used in the www-url-encoded format for HTML forms using the default GET method. However, when references are done in a HTML page to an URL that contains static query parameters, these ‘&’ separators should be encoded in the HTML code as ‘&’ for HTML conformance. This is not part of the URL specification, but of the HTML encapsulation! Some browsers forget this, and send ‘&’ with their HTTP GET query. You may wish to substitute ‘&’ by ‘&’ when parsing and validating URLs. This should be done BEFORE calling urlencode() on query parts.

The [‘fragment’] part of a parsed URL (after the first ‘#’ separator found in any URL) must not be encoded with this rawurlencode() function but instead by urlencode().

Validating a URL sent in a HTTP request is then more complicated than what you may think. This must be done only on parsed URLs (where the basic elements of an URL have been splitted), and then you must explode the path components, and check the presence of ‘&’ sequences in the query or fragment parts.

Читайте также:  Int type arraylist in java

The next thing to do is to check the URL scheme that you want to support (for example, only ‘http’, ‘https’, or ‘ftp’).

You may wich to check the [‘port’] part to see if it’s really a decimal integer between 1 and 65535.
You may wish to remove the default port number used by the URL schemes you want to support (for example the port ’80’ for ‘http’, the port ’21’ for ‘ftp’, the port ‘443’ for ‘https’), and restrict severely all port numbers below 1024, or some critical ports below 140 (this includes DNS and NetBios ports).

Then you may wish to control severely the [‘host’] part (in fact a full host domain name or an IP address), by forbidding those host names that don’t contain at least one dot, forbidding those that start with a dot, those that contain two consecutive dots, those that start or finish with a ‘-‘ dash, those that contain ‘.-‘ or ‘-.’ (invalid in all domain names), those that contain two dashes in another position than the second and third character of a domain name part and not folled by at least one other character, forbid top level domain names that have only one non numeric character, or more than 6 characters («.museum» is, for now, the longest acceptable TLD), check that pseudo-TLD names that are pure integers are effectively between 0 and 255, in that case check that this is a valid IPv4 address by comparing it to long2ip(ip2long($host)), .

This done, you must use the urlencode() function on all parts up to the exploded path elements, and rawurlencode() on the query and fragment parts, according to the specs, to recreate a complete and validated URL.

Note that RFC 1738 has been amended:
The «[» and «]» are no longer considered unsafe, but instead are now considered «reserved», meaning that they CAN be used in URLs!

Currently this usage has only been allowed in the hostname part, but there are some proposals to allow such use in some URL schemes. Similar extensions are now found that use the «<>» characters as «reserved» characters with special semantics, instead of «unsafe» characters that must be URL encoded.

Note also that some characters are currently «reserved» but should have instead been considered as «unsafe»: this includes the parenthesis «()» which are clearly unsafe when a URL is used in MIME headers.

Because of this, if a valid URL contains «()» characters, one should use an upper-level encoding to either enclose the URL with a pair of «unsafe» characters defined in the upper-level protocol (for example a «<>» pair in MIME headers, because these characters cannot be part of a valid URL).

phpversion()>=5.3 will compliant with RFC 3986, while phpversion()<=5.2.7RC1 is not compliant with RFC 3986.

RFC 1738 section 2.2
only alphanumerics, the special characters «$-_.+!*'(),», and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Читайте также:  Шрифт

RFC 2396 section 2.3
unreserved = alphanum | mark
mark = «-» | «_» | «.» | «!» | «~» | «*» | «‘» | «(» | «)»

RFC 2732 section 3
(3) Add «[» and «]» to the set of ‘reserved’ characters:

RFC 3986 section 2.3
unreserved = ALPHA / DIGIT / «-» / «.» / «_» / «~»

RFC 3987 section 2.2
unreserved = ALPHA / DIGIT / «-» / «.» / «_» / «~»

— 1) About «reserved» characters in URLS:

Beware that RFC 1738 specifies that the characters «», «|», «\», «^», «~», «[«, «]», and «`» are all considered unsafe and SHOULD be URL-encoded with a «%xx» triplet within *ALL* URLs.

However, some HTTP URLs seem to use the «~» character as a prefix for a user account for example:
http://www.any.host.domain/~user/subpath/page.html?query#fragment

This usage is acceptable, but the RFC specifies that «%7E» should be used instead of «~» in the path component. HTTP servers should accept «~» as being equivalent to «%7E», and according to the RFC, the «%7E» form should be the canonical one.

However, some HTTP servers are not fully complying to this RFC and consider «%7E» differently from «~» (i.e. they consider it as being part of a path component name, and search a directory name containing a «~» character, instead of mapping the «~user» path component to a user’s directory. In that case, these non compliant HTTP server will not find the resource associated to that URL and may return a 404 error or other errors such as an access denied.

When using rawurlencode() on such HTTP URLs, it’s best to consider this legacy usage, by using str_replace() on the result to convert back «/%7E» to «/~», so that the URLs will correctly map to the legacy use of the «~» character by these servers. On compliant HTTP servers, they will treat the «~» unsafe character equivalently with the «%7E» recommanded form, so they will automatically canonicalize the «~» character into «%7E».

— 2) Encoding of hostnames in URLs

Finally, beware that host domain names parts in URLs *MUST NOT* be encoded with rawurlencode(), as the «[» and «]» are valid delimiters that *MUST* be used to reference an IPv6 address or other hostnames that don’t fit to the restricted set of characters allowed in a host name (the «[» and «]» characters MUST be used if the hostname includes characters such as «:» which is typically used to specify an alternate non-default port number).

The encoding of host names uses another encoding, required to encode international domain names, with a base-64 encoding of Unicode characters and a «bq—» prefix. This encoding must be used only on individual subdomain parts (separated by «.» characters). This encoding does not use any «%xx» triplets.

So NEVER use urlencode() or rawurlencode() on an unparsed URL, unless this full URL is part of a query parameter string!

— 3) Encoding of username/passwords in URLs:

There is no standard to specify a password in a URL. In fact, there’s a legacy usage of the «:» character to separate a username from a password, but it is strongly discouraged. The RFC does not attempt to specify a semantic to the authentication part of an URL (before the «@» character and the hostname part).

Читайте также:  Получить http запрос java

If you need to encode a password, always use rawurlencode() on username and passwords separately, and then insert the «:» character to separate both components. Don’t use urlencode() (which could use a «+» to encode a space, and would not work because usernames and passwords consider «+» and spaces as being different!)

Источник

Функции URL

Note that $_SERVER[«HTTP_REFERER»] may not include GET data that was included in the referring address, depending on the browser. So if you rely on GET variables to generate a page, it’s not a good idea to use HTTP_REFERER to smoothly «bounce» someone back to the page he/she came from.

just a side note to the above you will need to add the ?

Note also that the URL shown in $HTTP_REFERER is not always the URL of the web page where the user clicked to invoke the PHP script.
This may instead be a document of your own web site, which contains an HTML element whose one attribute references the script. Note also that the current page fragment (#anchor) may be transmitted or not with the URL, depending on the browser.
Examples:

In such case, browsers should transmit the URL of the container document, but some still persist in using the previous document in the browser history, and this could cause a different $HTTP_REFERER value be sent when the user comes back to the document referencing your script. If you wanna be sure that the actual current document or previous document in the history is sent, use client-side JavaScript to send it to your script:

And then check the value of $js in your page script to generate appropriate content when the remote user agent does not support client-side scripts (such as most index/scan robots, some old or special simplified browsers, or browsers with JavaScript disabled by their users).

Following method do not show the URL in user browser (as the author claimed) if the code resides in the source page of FRAME or IFRAME (say SRC=»sourcepage.php») . In that case the URL of the SOURCE page is displayed.

$url = sprintf(«%s%s%s»,»http://»,$HTTP_HOST,$REQUEST_URI);
echo «$url»;

To check if a URL is valid, try to fopen() it. If fopen() results an error (returns false), then PHP cannot open the URL you asked. This is usually because it is not valid.

When using a multiple select on a form, I ran into a little issue of only receiving the last value form the select box.
I had a select box named organization_id with two values (92 and 93).
To get the values of both, I had to use the following:

$temp_array = split(«&», $_SERVER[‘QUERY_STRING’]);
foreach($temp_array as $key=>$value) if(substr($value, 0, 15) == «organization_id») $_GET[‘organizations’][] = substr($value, 15, strlen($value));
>
>

this results in a $_GET array like this :

(
[page] => idea_submission
[organization_id] => 93
[organizations] => Array
(
[0] => =92
[1] => =93
)

Источник

Оцените статью