Comment valider une adresse email en utilisant une expression régulière?

Au fil des années, j’ai développé lentement une expression régulière qui valide correctement les adresses électroniques MOST, en supposant qu’elles n’utilisent pas d’adresse IP comme partie serveur.

Je l’utilise dans plusieurs programmes PHP, et cela fonctionne la plupart du temps. Cependant, de temps en temps, je suis contacté par quelqu’un qui rencontre des problèmes avec un site qui l’utilise, et je dois faire quelques ajustements (plus récemment, j’ai réalisé que je n’autorisais pas les TLD à 4 caractères).

Quelle est la meilleure expression régulière que vous avez ou avez vue pour valider les emails?

J’ai vu plusieurs solutions qui utilisent des fonctions qui utilisent plusieurs expressions plus courtes, mais je préfère avoir une expression longue et complexe dans une fonction simple au lieu de plusieurs expressions courtes dans une fonction plus complexe.

La regex conforme à la norme RFC 822 est inefficace et obscure en raison de sa longueur. Heureusement, la RFC 822 a été remplacée deux fois et la spécification actuelle pour les adresses électroniques est RFC 5322 . La RFC 5322 conduit à une regex qui peut être comprise si elle est étudiée pendant quelques minutes et qui est suffisamment efficace pour une utilisation réelle.

Une regex conforme à la norme RFC 5322 se trouve en haut de la page sur http://emailregex.com/ mais utilise le modèle d’adresse IP qui flotte sur Internet avec un bogue autorisant 00 pour toutes les valeurs décimales d’octets non signés dans une adresse délimitée par des points, ce qui est illégal. Le rest semble être compatible avec la grammaire RFC 5322 et passe plusieurs tests avec grep -Po , y compris les noms de domaine, les adresses IP, les mauvais et les noms de compte avec et sans guillemets.

En corrigeant le bug 00 dans le modèle IP, on obtient une expression rationnelle qui fonctionne et qui est assez rapide. (Raclez la version rendue, pas la balise, pour le code actuel.)

(?: [a-z0-9! # $% & ‘* + / =? ^ _ `{|} ~ -] + (?: \. [a-z0-9! # $% &’ * + / =? ^ _ `{|} ~ -] +) * |” (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21 \ x23- \ x5b \ x5d- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) * “) @ (?: (?: [a-z0-9] (?: [a-z0-9 -] * [a-z0 -9])? \.) + [A-z0-9] (?: [A-z0-9 -] * [a-z0-9])? | \ [(? :(? 🙁 2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9])) \.) {3} ( ? 🙁 2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9]) | a-z0-9 -] * [a-z0-9]: (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21- \ x5a \ x53- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) +) \])

Voici le schéma de la machine à états finis pour regexp ci-dessus qui est plus clair que regexp lui-même entrer la description de l'image ici

Les patterns plus sophistiqués en Perl et PCRE (librairie regex utilisée par exemple en PHP) permettent d’ parsingr correctement RFC 5322 . Python et C # peuvent le faire aussi, mais ils utilisent une syntaxe différente de ces deux premiers. Cependant, si vous êtes obligé d’utiliser l’un des nombreux langages moins puissants, il est préférable d’utiliser un véritable parsingur.

Il est également important de comprendre que la validation selon la RFC ne vous dit absolument pas si cette adresse existe réellement sur le domaine fourni ou si la personne qui entre l’adresse est son véritable propriétaire. Les gens signent les autres aux listes de diffusion de cette façon tout le temps. La correction qui nécessite un type de validation plus sophistiqué implique l’envoi d’un message contenant un jeton de confirmation destiné à être saisi sur la même page Web que l’adresse.

Les jetons de confirmation sont le seul moyen de savoir que vous avez l’adresse de la personne qui l’a saisie. C’est pourquoi la plupart des listes de diffusion utilisent désormais ce mécanisme pour confirmer les inscriptions. Après tout, tout le monde peut mettre president@whitehouse.gov , et cela sera même légal, mais ce n’est pas la personne à l’autre bout.

Pour PHP, vous ne devez pas utiliser le modèle indiqué dans Valider une adresse e-mail avec PHP, ce que je cite:

Un usage courant et un codage bâclé répandu risquent de créer une norme de facto pour les adresses électroniques plus ressortingctive que la norme formelle enregistrée.

Ce n’est pas mieux que tous les autres modèles non-RFC. Ce n’est même pas assez intelligent pour gérer même RFC 822 , sans parler de RFC 5322. Celui-ci , cependant, est.

Si vous voulez être chic et pédant, implémentez un moteur d’état complet . Une expression régulière ne peut agir que comme un filtre rudimentaire. Le problème avec les expressions régulières est que dire à quelqu’un que son adresse de courrier électronique parfaitement valide est invalide (un faux positif) car votre expression régulière ne peut pas la gérer est simplement grossière et impolie du sharepoint vue de l’utilisateur. Un moteur d’état à cette fin peut à la fois valider et même corriger les adresses de messagerie qui seraient autrement considérées comme non valides car elles démontent l’adresse de messagerie en fonction de chaque RFC. Cela permet une expérience potentiellement plus agréable, comme

L’adresse e-mail spécifiée ‘myemail @ address, com’ n’est pas valide. Voulez-vous dire ‘myemail@address.com’?

Voir également Validation des adresses électroniques , y compris les commentaires. Ou comparer l’adresse de messagerie en validant les expressions régulières .

Visualisation régulière des expressions

Debuggex Demo

Vous ne devez pas utiliser des expressions régulières pour valider les adresses électroniques.

Au lieu de cela, utilisez la classe MailAddress , comme ceci:

 try { address = new MailAddress(address).Address; } catch(FormatException) { //address is invalid } 

La classe MailAddress utilise un parsingur BNF pour valider l’adresse conformément à la RFC822.

Si vous voulez vraiment utiliser une expression régulière, la voici :

  (?: (?: \ r \ n)? [\ t]) * (?: (?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
 ) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ R \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(? :( ?:
 \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \ \ ". \ [\] \ 000- \ 031] + (? :(? :(
 ?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [ ^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [ 
 \ t])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 0
 31] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\ ]])) | \ [([^ \ [\] \ r \\] | \\.) * \
 ] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] +
 (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) ) | \ [([^ \ [\] \ r \\] | \\.) * \] (?:
 (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
 | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (? :( ?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)
 ? [\ t]) *) * \ <(?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \
 r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,:: \\". \ [\]])) | \ [([^ \ [\ ] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [
  \ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)
 ? [\ t]) + | \ Z | (? = [\ ["() <> @,:: \\". \ [\]]) | \ [([^ \ [\] \ r \ \] | \\.) * \] (?: (?: \ r \ n)? [\ t]
 ) *)) * (?:, @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [
  \ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *
 ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
 ) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\ .) * \] (?: (?: \ r \ n)? [\ t]) *)) *)
 *: (?: (?: \ r \ n)? [\ t]) *)? (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) +
 | \ Z | (? = [\ ["() <> @,:: \\". \ [\]])) | "(?: [^ \" \ R \\] | \\. | ( ?: (?: \ r \ n)? [\ t])) * "(?: (?:
 \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ " . \ [\] \ 000- \ 031] + (? :(? :( ?:
 \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,:: \\". \ [\]])) | "(?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t
 ])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031
 ] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\] ])) | \ [([^ \ [\] \ r \\] | \\.) * \] (
 ?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?
 : (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) | \ [([^ \ [\] \ r \\] | \\.) * \] (? :(?
 : \ r \ n)? [\ t]) *)) * \> (?: (?: \ r \ n)? [\ t]) *) | (?: [^ () <> @ ,; : \\ ". \ [\] \ 000- \ 031] + (? :(?
 : (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(? : [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)?
 [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) *: (?: (?: \ r \ n)? [\ t]) * (?: (?: (?: [^ () <> @,;: \\ ". \ [\] 
 \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\" . \ [\]])) | "(?: [^ \" \ r \\] |
 \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? : (?: \ r \ n)? [\ t]) * (?: [^ () <>

 @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [ "() <> @,;: \\". \ [\]])) | "
 (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (?: (?: \ r \ n)? [ \ t]) *)) * @ (?: (?: \ r \ n)? [\ t]
 ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\
 ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?
 : [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [
 \]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000-
 \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [ \]])) | "(?: [^ \" \ r \\] | \\. | (
 ?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) * \ <(?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @ ,;
 : \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" () <> @,;: \\ ". \ [\]])) | \ [([
 ^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ "
 . \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @, ;: \\ ". \ [\]])) | \ [([^ \ [\
 ] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * (?:, @ (?: (?: \ r \ n )? [\ t]) * (?: [^ () <> @,;: \\ ". \
 [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,::: \\ ". \ [\]])) | \ [([^ \ [\] \
 r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] 
 \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\" . \ [\]])) | \ [([^ \ [\] \ r \\]
 | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) *) *: (?: (?: \ r \ n)? [\ t]) * )? (?: [^ () <> @,;: \\ ". \ [\] \ 0
 00- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\
 . | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? :( ?: \ r \ n)? [\ t]) * (?: [^ () <> @,
 ;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" ( ) <> @;;: \\ ". \ [\]])) |" (?
 : [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (?: (?: \ r \ n)? [\ t ]) *)) * @ (?: (?: \ r \ n)? [\ t]) *
 (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\".
 \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) ( ?: \. (?: (?: \ r \ n)? [\ t]) * (?: [
 ^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | ( ? = [\ ["() <> @,;: \\". \ [\]
 ])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> ( ?: (?: \ r \ n)? [\ t]) *) (?:, \ s * (
 ?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\
 ". \ [\]])) |" (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (? : (?: \ r \ n)? [\ t]) *) (?: \. (? :(
 ?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (? :(? :(? : \ r \ n)? [\ t]) + | \ Z | (? = [
 \ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t
 ]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]
 ]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \ \.) * \] (?: (?: \ r \ n)? [\ t]) *) (?
 : \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + ( ?: (?: (?: \ r \ n)? [\ t]) + |
 \ Z | (? = [\ ["() <> @,;:: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?:
 [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\
 ]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) * \ <(?: (?: \ r \ n)
 ? [\ t]) * (?: @ (?: [^ () <> @;;:: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["
 () <> @,;: \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)
 ? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <>

 @,;: \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * (?:, @ (?: (?: \ r \ n)? [
  \ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,
 ;: \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]
 ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,:: \\
 ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * )) *) *: (?: (?: \ r \ n)? [\ t]) *)?
 (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\".
 \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(? :( ?: \ r \ n)? [\ t]) *) (?: \. (? :( ?:
 \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ [
 "() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n) ? [\ t])) * "(?: (?: \ r \ n)? [\ t])
 *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t])
 + | \ Z | (? = [\ ["() <> @,:: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\. ) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \
 . (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
 | (? = [\ ["() <> @,;:: \\". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> (? :(
 ?: \ r \ n)? [\ t]) *)) *)?; \ s *) 

Cette question est souvent posée, mais je pense que vous devriez revenir en arrière et vous demander pourquoi vous souhaitez valider syntaxiquement les adresses e-mail? Quel est le bénéfice vraiment?

  • Il ne sera pas attraper les fautes de frappe communes.
  • Cela n’empêche pas les gens d’entrer des adresses électroniques non valides ou inventées ou d’entrer l’adresse d’un tiers.

Si vous souhaitez valider qu’un e-mail est correct, vous n’avez pas d’autre choix que d’envoyer un e-mail de confirmation et de demander à l’utilisateur d’y répondre. Dans de nombreux cas, vous devrez envoyer un mail de confirmation pour des raisons de sécurité ou pour des raisons éthiques (vous ne pouvez donc pas, par exemple, signer un service contre leur gré).

Cela dépend de ce que vous entendez par le mieux: si vous parlez d’attraper chaque adresse email valide, utilisez ce qui suit:

 (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*) 

( http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html ) Si vous cherchez quelque chose de plus simple, mais que la plupart des adresses e-mail sont valides, essayez quelque chose comme:

 "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" 

EDIT: À partir du lien:

Cette expression régulière ne validera que les adresses pour lesquelles des commentaires ont été supprimés et remplacés par des espaces (cela est fait par le module).

Tout dépend de la précision que vous voulez être. Pour mes fins, où j’essaie juste de garder des choses comme bob @ aol.com (espaces dans les emails) ou steve (pas de domaine du tout) ou mary@aolcom (pas de période avant .com), j’utilise

 /^\S+@\S+\.\S+$/ 

Bien sûr, cela correspondra à des choses qui ne sont pas des adresses électroniques valides, mais il s’agit de jouer la règle des 90/10.

[UPDATED] J’ai rassemblé ici tout ce que je sais sur la validation des adresses e-mail: http://isemail.info , qui non seulement valide mais diagnostique également les problèmes liés aux adresses e-mail. Je suis d’accord avec beaucoup de commentaires ici que la validation n’est qu’une partie de la réponse; voir mon essai à http://isemail.info/about .

is_email () rest, à ma connaissance, le seul validateur qui vous indiquera définitivement si une chaîne donnée est une adresse email valide ou non. J’ai téléchargé une nouvelle version sur http://isemail.info/

J’ai rassemblé des cas de test de Cal Henderson, Dave Child, Phil Haack, Doug Lovell, RFC5322 et RFC 3696. 275 adresses de test au total. J’ai effectué tous ces tests contre tous les validateurs gratuits que j’ai pu trouver.

Je vais essayer de garder cette page à jour, car les gens améliorent leurs validateurs. Merci à Cal, Michael, Dave, Paul et Phil pour leur aide et leur coopération dans la compilation de ces tests et la critique constructive de mon propre validateur .

Les gens devraient être au courant des errata contre la RFC 3696 en particulier. Trois des exemples canoniques sont en fait des adresses non valides. Et la longueur maximale d’une adresse est 254 ou 256 caractères, pas 320.

Selon les spécifications HTML5 du W3C :

 ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ 

Le contexte:

Une adresse électronique valide est une chaîne qui correspond à la production ABNF […].

Remarque: Cette exigence constitue une violation volontaire de la RFC 5322 , qui définit une syntaxe pour les adresses de messagerie à la fois trop ssortingcte (avant le caractère «@»), trop vague (après le caractère «@») et trop laxiste ( permettre des commentaires, des caractères d’espacement et des chaînes de caractères dans des manières peu familières à la plupart des utilisateurs).

L’expression régulière compatible JavaScript et Perl suivante est une implémentation de la définition ci-dessus.

 /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

C’est facile en Perl 5.10 ou plus récent:

 /(?(DEFINE) (?
(?&mailbox) | (?&group)) (? (?&name_addr) | (?&addr_spec)) (? (?&display_name)? (?&angle_addr)) (? (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (? (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ; (?&CFWS)?) (? (?&phrase)) (? (?&mailbox) (?: , (?&mailbox))*) (? (?&local_part) \@ (?&domain)) (? (?&dot_atom) | (?&quoted_ssortingng)) (? (?&dot_atom) | (?&domain_literal)) (? (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&FWS)? \] (?&CFWS)?) (? (?&dtext) | (?&quoted_pair)) (? (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (? (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~]) (? (?&CFWS)? (?&atext)+ (?&CFWS)?) (? (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (? (?&atext)+ (?: \. (?&atext)+)*) (? [\x01-\x09\x0b\x0c\x0e-\x7f]) (? \\ (?&text)) (? (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (? (?&qtext) | (?&quoted_pair)) (? (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (? (?&atom) | (?&quoted_ssortingng)) (? (?&word)+) # Folding white space (? (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (? (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (? (?&ctext) | (?&quoted_pair) | (?&comment)) (? \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (? (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (? [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (? [A-Za-z]) (? [0-9]) (? \x0d \x0a) (? ") (? [\x20\x09]) ) (?&address)/x

I use

 ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ 

Which is the one used in ASP.NET by the RegularExpressionValidator.

Don’t know about best, but this one is at least correct, as long as the addresses have their comments ssortingpped and replaced with whitespace.

Sérieusement. You should use an already written library for validating emails. The best way is probably to just send a verification e-mail to that address.

The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people. So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress construtor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation but at a minimum you need that anyway.

 protected void emailValidator_ServerValidate(object source, ServerValidateEventArgs args) { try { var a = new MailAddress(txtEmail.Text); } catch (Exception ex) { args.IsValid = false; emailValidator.ErrorMessage = "email: " + ex.Message; } } 

Quick answer

Use the following regex for input validation:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Addresses matched by this regex:

  • have a local part (ie the part before the @-sign) that is ssortingctly compliant with RFC 5321/5322,
  • have a domain part (ie the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a ressortingction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there’s no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn’t refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs , most notably RFC 822 and RFC 5322 . RFC 822 should be seen as the “original” standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have ressortingcted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (ie jdoe@example.org , but not "John Doe" , nor some-group:jdoe@example.org,mrx@exampel.org; ).

There’s one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can’t be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I’ll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation . Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is first.last@[3.5.7.9] .

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX “extended” regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

I believe it’s fully complient with RFC 822 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked “erratum”) and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I’ve expanded the expressions to accommodate this rule and marked the result with “opt-lwsp”.

 CHAR =  =~ . CTL =  =~ [\x00-\x1F\x7F] CR =  =~ \r LF =  =~ \n SPACE =  =~ HTAB =  =~ \t <"> =  =~ " CRLF = CR LF =~ \r\n LWSP-char = SPACE / HTAB =~ [ \t] linear-white-space = 1*([CRLF] LWSP-char) =~ ((\r\n)?[ \t])+ specials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "." / "[" / "]" =~ [][()<>@,;:\\".] quoted-pair = "\" CHAR =~ \\. qtext = , "\" & CR, and including linear-white-space> =~ [^"\\\r]|((\r\n)?[ \t])+ dtext =  =~ [^][\\\r]|((\r\n)?[ \t])+ quoted-ssortingng = <"> *(qtext|quoted-pair) <"> =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*" (erratum) =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-literal = "[" *(dtext|quoted-pair) "]" =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*] (erratum) =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] atom = 1* =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+ word = atom / quoted-string =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-ref = atom sub-domain = domain-ref / domain-literal =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] local-part = word *("." word) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* domain = sub-domain *("." sub-domain) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* addr-spec = local-part "@" domain =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))* 

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*])

I believe it’s fully complient with RFC 5322 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked “(normalized)” that doesn’t accept this whitespace.

I ignored all the “obs-” rules from the RFC. This means that the regexes only match email addresses that are ssortingctly RFC 5322 compliant. If you have to match “old” addresses (as the looser grammar including the “obs-” rules does), you can use one of the RFC 822 regexes from the previous paragraph.

 VCHAR = %x21-7E =~ [!-~] ALPHA = %x41-5A / %x61-7A =~ [A-Za-z] DIGIT = %x30-39 =~ [0-9] HTAB = %x09 =~ \t CR = %x0D =~ \r LF = %x0A =~ \n SP = %x20 =~ DQUOTE = %x22 =~ " CRLF = CR LF =~ \r\n WSP = SP / HTAB =~ [\t ] quoted-pair = "\" (VCHAR / WSP) =~ \\[\t -~] FWS = ([*WSP CRLF] 1*WSP) =~ ([\t ]*\r\n)?[\t ]+ ctext = %d33-39 / %d42-91 / %d93-126 =~ []!-'*-[^-~] ("comment" is left out in the regex) ccontent = ctext / quoted-pair / comment =~ []!-'*-[^-~]|(\\[\t -~]) (not regular) comment = "(" *([FWS] ccontent) [FWS] ")" (is equivalent to FWS when leaving out comments) CFWS = (1*([FWS] comment) [FWS]) / FWS =~ ([\t ]*\r\n)?[\t ]+ atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" =~ [-!#-'*+/-9=?AZ^-~] dot-atom-text = 1*atext *("." 1*atext) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* dot-atom = [CFWS] dot-atom-text [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* qtext = %d33 / %d35-91 / %d93-126 =~ []!#-[^-~] qcontent = qtext / quoted-pair =~ []!#-[^-~]|(\\[\t -~]) (erratum) quoted-ssortingng = [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ "([]!#-[^-~ \t]|(\\[\t -~]))+" dtext = %d33-90 / %d94-126 =~ [!-Z^-~] domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ \[[\t -Z^-~]*] local-part = dot-atom / quoted-ssortingng =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+" domain = dot-atom / domain-literal =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*] addr-spec = local-part "@" domain =~ ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?) (normalized) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*]) 

Note that some sources (notably w3c ) claim that RFC 5322 is too ssortingct on the local part (ie the part before the @-sign). This is because “..”, “a..b” and “a.” are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of a..b@example.net you should write "a..b"@example.net , which is semantically equivalent.

Further ressortingctions

SMTP (as defined in RFC 5321 ) further ressortingcts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this ssortingcter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the “local” part (ie the part before the @-sign), but is ssortingcter on the domain part (ie the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of “correcting” the rules in question, using this draft and RFC 1034 as guidelines. Here’s the resulting regex.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Note that depending on the use case you may not want to allow for a “General-address-literal” in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the “General-address-literal” part to match malformed IPv6 addresses. Some regex processors don’t support negative lookahead. Remove the subssortingng |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole “General-address-literal” part out.

Here’s the derivation:

 Let-dig = ALPHA / DIGIT =~ [0-9A-Za-z] Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig =~ [0-9A-Za-z-]*[0-9A-Za-z] (regex is updated to make sure sub-domains are max. 63 charactes long - RFC 1034 section 3.5) sub-domain = Let-dig [Ldh-str] =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? Domain = sub-domain *("." sub-domain) =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)* Snum = 1*3DIGIT =~ [0-9]{1,3} (suggested replacement for "Snum") ip4-octet = DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35 =~ 25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9] IPv4-address-literal = Snum 3("." Snum) =~ [0-9]{1,3}(\.[0-9]{1,3}){3} (suggested replacement for "IPv4-address-literal") ip4-address = ip4-octet 3("." ip4-octet) =~ (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement for "IPv6-hex") ip6-h16 = "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) ) =~ 0|[1-9A-Fa-f][0-9A-Fa-f]{0,3} (not from RFC) ls32 = ip6-h16 ":" ip6-h16 / ip4-address =~ (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement of "IPv6-addr") ip6-address = 6(ip6-h16 ":") ls32 / "::" 5(ip6-h16 ":") ls32 / [ ip6-h16 ] "::" 4(ip6-h16 ":") ls32 / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32 / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32 / [ *3(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 ":" ls32 / [ *4(ip6-h16 ":") ip6-h16 ] "::" ls32 / [ *5(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 / [ *6(ip6-h16 ":") ip6-h16 ] "::" =~ (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?:: IPv6-address-literal = "IPv6:" ip6-address =~ IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) Standardized-tag = Ldh-str =~ [0-9A-Za-z-]*[0-9A-Za-z] dcontent = %d33-90 / %d94-126 =~ [!-Z^-~] General-address-literal = Standardized-tag ":" 1*dcontent =~ [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ address-literal = "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]" =~ \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)] Mailbox = Local-part "@" ( Domain / address-literal ) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]) 

User input validation

A common use case is user input validation, for example on an html form. In that case it’s usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

I do not recommend ressortingcting the local part further, eg by precluding quoted ssortingngs, since we don’t know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "ab"@example.net ).

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how “.museum” invalidated [az]{2,4} ), but if you must:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info| etc… )

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

Further considerations

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don’t enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is ssortingctly speaking still regular, it’s not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple “syntactically incorrect address”. With “vanilla” regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

There are plenty examples of this out on the net (and I think even one that fully validates the RFC – but it’s tens/hundreds of lines long if memory serves). People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length. It’s sortingvial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

While deciding which characters are allowed, please remember your apostrophed and hyphenated friends. I have no control over the fact that my company generates my email address using my name from the HR system. That includes the apostrophe in my last name. I can’t tell you how many times I have been blocked from interacting with a website by the fact that my email address is “invalid”.

This regex is from Perl’s Email::Valid library. I believe it to be the most accurate, it matches all 822. And, it is based on the regular expression in the O’Reilly book:

Regular expression built using Jeffrey Friedl’s example in Mastering Regular Expressions ( http://www.ora.com/catalog/regexp/ ).

 $RFC822PAT = <<'EOF'; [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015 "]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]* )*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\ \\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\ x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n \015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ \\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\ x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80- \xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015() ]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\ x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04 0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\ 015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?! [^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\ ]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\ x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01 5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff] )|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^ ()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0 15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][ ^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\ n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\ x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(? :(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80- \xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]* (?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015 ()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015() ]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0 40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\ [^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\ xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]* )*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80 -\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x 80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t ]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\ \[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff]) *\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x 80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80 -\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015( )]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\ \x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t ]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0 15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015 ()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( \040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]| \\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80 -\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015() ]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff ])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\ \x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015 ()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\ \\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^ (\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000- \037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\ n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]| \([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\)) [^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff \n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*( ?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\ 000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*) *\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80- \xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*) *(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\ ]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\] )[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80- \xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( ?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80 -\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)< >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?: \([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()] *(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*) *\)[\040\t]*)*)*>) EOF 

As you’re writing in PHP I’d advice you to use the PHP build-in validation for emails.

 filter_var($value, FILTER_VALIDATE_EMAIL) 

If you’re running a php-version lower than 5.3.6 please be aware of this issue: https://bugs.php.net/bug.php?id=53091

If you want more information how this buid-in validation works, see here: Does PHP’s filter_var FILTER_VALIDATE_EMAIL actually work?

Cal Henderson (Flickr) wrote an article called Parsing Email Adresses in PHP and shows how to do proper RFC (2)822-compliant Email Address parsing. You can also get the source code in php , python and ruby which is cc licensed .

I never bother creating with my own regular expression, because chances are that someone else has already come up with a better version. I always use regexlib to find one to my liking.

There is not one which is really usable.
I discuss some issues in my answer to Is there a php library for email address validation? , it is discussed also in Regexp recognition of email address hard?

In short, don’t expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail (jhohn@example.com is correct but it will probably bounce…).

One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won’t reject anything, but after reviewing the spec I can’t find any email that would be valid and rejected.

email =~ /.+@[^@]+\.[^@]{2,}$/

You could use the one employed by the jQuery Validation plugin:

 /^((([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i 

For the most comprehensive evaluation of the best regular expression for validating an email address please see this link; ” Comparing E-mail Address Validating Regular Expressions ”

Here is the current top expression for reference purposes:

 /^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[az])\.)+[az]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i 

Not to mention that non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names are to be allowed in the near future . Everyone has to change the email regex used, because those characters are surely not to be covered by [az]/i nor \w . They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. Ie send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn’t look like in the right format, best is still to check if it matches basically the following regex:

 ^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$ 

Simple as that. Why on earth would you care about the characters used in the name and domain? It’s the client’s responsibility to enter a valid email address, not the server’s. Even when the client enters a syntactically valid email address like aa@bb.cc , this does not guarantee that it’s a legit email address. No one regex can cover that.

The HTML5 spec suggests a simple regex for validating email addresses:

 /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

This intentionally doesn’t comply with RFC 5322 .

Note: This requirement is a willful violation of RFC 5322 , which defines a syntax for e-mail addresses that is simultaneously too ssortingct (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted ssortingngs in manners unfamiliar to most users) to be of practical use here.

The total length could also be limited to 254 characters, per RFC 3696 errata 1690 .

For a vivid demonstration, the following monster is pretty good but still does not correctly recognize all syntactically valid email addresses: it recognizes nested comments up to four levels deep.

This is a job for a parser, but even if an address is syntactically valid, it still may not be deliverable. Sometimes you have to resort to the hillbilly method of “Hey, y’all, watch ee-us!”

 // derivative of work with the following copyright and license: // Copyright (c) 2004 Casey West. All rights reserved. // This module is free software; you can redissortingbute it and/or // modify it under the same terms as Perl itself. // see http://search.cpan.org/~cwest/Email-Address-1.80/ private static ssortingng gibberish = @" (?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:(?-xism:[ ^\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+(?-xism:(?-xi sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xis m:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\ s*)+|\s+)*))+)?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(? -xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<(?-xism:(?-xi sm:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^( )\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*( ?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D])) |)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()< >\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s] +)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism: (?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s *\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xi sm:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)* (?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D] )))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-x ism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+ )*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:( ?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(? -xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s *\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+( ?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*) +|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-x ism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-xi sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism: \\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+ )*\s*\)\s*)+|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?- xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))|(?-xism:(?-x ism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s* (?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]) )|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F() <>\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s ]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism :(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\ s*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-x ism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+) *(?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D ])))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\ \]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?- xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|) +)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism: (?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\( ?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[ ^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\ s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+ (?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism :\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism: [^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s* )+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism :(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\( (?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A \x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?- xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-x ism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism :\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*)) +)*\s*\)\s*)+|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(? >[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?: \s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0 D]))|)+)*\s*\)\s*))+)*\s*\)\s*)*)" .Replace("", "\"") .Replace("\t", "") .Replace(" ", "") .Replace("\r", "") .Replace("\n", ""); private static Regex mailbox = new Regex(gibberish, RegexOptions.ExplicitCapture); 

Here’s the PHP I use. I’ve choosen this solution in the spirit of “false positives are better than false negatives” as declared by another commenter here AND with regards to keeping your response time up and server load down … there’s really no need to waste server resources with a regular expression when this will weed out most simple user error. You can always follow this up by sending a test email if you want.

 function validateEmail($email) { return (bool) ssortingpos($email,'@'); } 

According to official standard RFC 2822 valid email regex is

 (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) 

if you want to use it in Java its really very easy

 import java.util.regex.*; class regexSample { public static void main(Ssortingng args[]) { //Input the ssortingng for validation Ssortingng email = "xyz@hotmail.com"; //Set the email pattern ssortingng Pattern p = Pattern.comstack(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")" + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]"); //Match the given ssortingng with the pattern Matcher m = p.matcher(email); //check whether match is found boolean matchFound = m.matches(); if (matchFound) System.out.println("Valid Email Id."); else System.out.println("Invalid Email Id."); } } 

RFC 5322 standard:

Allows dot-atom local-part, quoted-ssortingng local-part, obsolete (mixed dot-atom and quoted-ssortingng) local-part, domain name domain, (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain, and (nested) CFWS.

 '/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD' 

RFC 5321 standard:

Allows dot-atom local-part, quoted-ssortingng local-part, domain name domain, and (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain.

 '/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD' 

De base:

Allows dot-atom local-part and domain name domain (requiring at least two domain name labels with the TLD limited to 2-6 alphabetic characters).

 "/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,126}[az]{2,6}$/iD" 

Strange that you “cannot” allow 4 characters TLDs. You are banning people from .info and .name , and the length limitation stop .travel and .museum , but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

For your regex of domain part, domain name cannot starts with ‘-‘ and cannot ends with ‘-‘. Dash can only stays in between.

If you used the PEAR library, check out their mail function (forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC822.

 public bool ValidateEmail(ssortingng sEmail) { if (sEmail == null) { return false; } int nFirstAT = sEmail.IndexOf('@'); int nLastAT = sEmail.LastIndexOf('@'); if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1))) { return (Regex.IsMatch(sEmail, @"^[az|0-9|AZ]*([_][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*(([_][az|0-9|AZ]+)*)?@[az][az|0-9|AZ]*\.([az][az|0-9|AZ]*(\.[az][az|0-9|AZ]*)?)$")); } else { return false; } }