Merge pull request #8863 from hashicorp/update_go-cty_regex

import new replace and regex_replace funcs from go-cty + documentation
This commit is contained in:
Megan Marsh 2020-03-10 13:09:09 -07:00 committed by GitHub
commit 072a71b419
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
20 changed files with 12566 additions and 16 deletions

2
go.mod
View File

@ -152,7 +152,7 @@ require (
github.com/xanzy/go-cloudstack v0.0.0-20190526095453-42f262b63ed0
github.com/yandex-cloud/go-genproto v0.0.0-20190916101622-7617782d381e
github.com/yandex-cloud/go-sdk v0.0.0-20190916101744-c781afa45829
github.com/zclconf/go-cty v1.3.1
github.com/zclconf/go-cty v1.3.2-0.20200309235747-0b5d9cf50df7
github.com/zclconf/go-cty-yaml v1.0.1
go.opencensus.io v0.22.2 // indirect
golang.org/x/crypto v0.0.0-20200117160349-530e935923ad

6
go.sum
View File

@ -71,6 +71,8 @@ github.com/apparentlymart/go-dump v0.0.0-20180507223929-23540a00eaa3 h1:ZSTrOEhi
github.com/apparentlymart/go-dump v0.0.0-20180507223929-23540a00eaa3/go.mod h1:oL81AME2rN47vu18xqj1S1jPIPuN7afo62yKTNn3XMM=
github.com/apparentlymart/go-textseg v1.0.0 h1:rRmlIsPEEhUTIKQb7T++Nz/A5Q6C9IuX2wFoYVvnCs0=
github.com/apparentlymart/go-textseg v1.0.0/go.mod h1:z96Txxhf3xSFMPmb5X/1W05FF/Nj9VFpLOpjS5yuumk=
github.com/apparentlymart/go-textseg/v12 v12.0.0 h1:bNEQyAGak9tojivJNkoqWErVCQbjdL7GzRt3F8NvfJ0=
github.com/apparentlymart/go-textseg/v12 v12.0.0/go.mod h1:S/4uRK2UtaQttw1GenVJEynmyUenKwP++x/+DdGV/Ec=
github.com/approvals/go-approval-tests v0.0.0-20160714161514-ad96e53bea43 h1:ePCAQPf5tUc5IMcUvu6euhSGna7jzs7eiXtJXHig6Zc=
github.com/approvals/go-approval-tests v0.0.0-20160714161514-ad96e53bea43/go.mod h1:S6puKjZ9ZeqUPBv2hEBnMZGcM2J6mOsDRQcmxkMAND0=
github.com/armon/circbuf v0.0.0-20150827004946-bbbad097214e/go.mod h1:3U/XgcO3hCbHZ8TKRvWD2dDTCfh9M9ya+I9JpbB7O8o=
@ -491,8 +493,8 @@ github.com/yandex-cloud/go-sdk v0.0.0-20190916101744-c781afa45829/go.mod h1:Eml0
github.com/zclconf/go-cty v1.0.0/go.mod h1:xnAOWiHeOqg2nWS62VtQ7pbOu17FtxJNW8RLEih+O3s=
github.com/zclconf/go-cty v1.2.0/go.mod h1:hOPWgoHbaTUnI5k4D2ld+GRpFJSCe6bCM7m1q/N4PQ8=
github.com/zclconf/go-cty v1.2.1/go.mod h1:hOPWgoHbaTUnI5k4D2ld+GRpFJSCe6bCM7m1q/N4PQ8=
github.com/zclconf/go-cty v1.3.1 h1:QIOZl+CKKdkv4l2w3lG23nNzXgLoxsWLSEdg1MlX4p0=
github.com/zclconf/go-cty v1.3.1/go.mod h1:YO23e2L18AG+ZYQfSobnY4G65nvwvprPCxBHkufUH1k=
github.com/zclconf/go-cty v1.3.2-0.20200309235747-0b5d9cf50df7 h1:YyicOFRGFLIyzuZCaKFhIwg1JFpi+x1hEPlfNQUHq2I=
github.com/zclconf/go-cty v1.3.2-0.20200309235747-0b5d9cf50df7/go.mod h1:nHzOclRkoj++EU9ZjSrZvRG0BXIWt8c7loYc0qXAFGQ=
github.com/zclconf/go-cty-yaml v1.0.1 h1:up11wlgAaDvlAGENcFDnZgkn0qUJurso7k6EpURKNF8=
github.com/zclconf/go-cty-yaml v1.0.1/go.mod h1:IP3Ylp0wQpYm50IHK8OZWKMu6sPJIUgKa8XhiVHura0=
go.opencensus.io v0.21.0 h1:mU6zScU4U1YAFPHEHYk+3JC4SY7JxgkqS10ZOSyksNg=

View File

@ -78,6 +78,8 @@ func Functions(basedir string) map[string]function.Function {
"pow": stdlib.PowFunc,
"range": stdlib.RangeFunc,
"reverse": stdlib.ReverseFunc,
"replace": stdlib.ReplaceFunc,
"regex_replace": stdlib.RegexReplaceFunc,
"rsadecrypt": crypto.RsaDecryptFunc,
"setintersection": stdlib.SetIntersectionFunc,
"setproduct": stdlib.SetProductFunc,

View File

@ -0,0 +1,95 @@
Copyright (c) 2017 Martin Atkins
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---------
Unicode table generation programs are under a separate copyright and license:
Copyright (c) 2014 Couchbase, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the License for the specific language governing permissions
and limitations under the License.
---------
Grapheme break data is provided as part of the Unicode character database,
copright 2016 Unicode, Inc, which is provided with the following license:
Unicode Data Files include all data files under the directories
http://www.unicode.org/Public/, http://www.unicode.org/reports/,
http://www.unicode.org/cldr/data/, http://source.icu-project.org/repos/icu/, and
http://www.unicode.org/utility/trac/browser/.
Unicode Data Files do not include PDF online code charts under the
directory http://www.unicode.org/Public/.
Software includes any source code published in the Unicode Standard
or under the directories
http://www.unicode.org/Public/, http://www.unicode.org/reports/,
http://www.unicode.org/cldr/data/, http://source.icu-project.org/repos/icu/, and
http://www.unicode.org/utility/trac/browser/.
NOTICE TO USER: Carefully read the following legal agreement.
BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT.
IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
THE DATA FILES OR SOFTWARE.
COPYRIGHT AND PERMISSION NOTICE
Copyright © 1991-2017 Unicode, Inc. All rights reserved.
Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Unicode data files and any associated documentation
(the "Data Files") or Unicode software and any associated documentation
(the "Software") to deal in the Data Files or Software
without restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, and/or sell copies of
the Data Files or Software, and to permit persons to whom the Data Files
or Software are furnished to do so, provided that either
(a) this copyright and permission notice appear with all copies
of the Data Files or Software, or
(b) this copyright and permission notice appear in associated
Documentation.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT OF THIRD PARTY RIGHTS.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder
shall not be used in advertising or otherwise to promote the sale,
use or other dealings in these Data Files or Software without prior
written authorization of the copyright holder.

View File

@ -0,0 +1,30 @@
package textseg
import (
"bufio"
"bytes"
)
// AllTokens is a utility that uses a bufio.SplitFunc to produce a slice of
// all of the recognized tokens in the given buffer.
func AllTokens(buf []byte, splitFunc bufio.SplitFunc) ([][]byte, error) {
scanner := bufio.NewScanner(bytes.NewReader(buf))
scanner.Split(splitFunc)
var ret [][]byte
for scanner.Scan() {
ret = append(ret, scanner.Bytes())
}
return ret, scanner.Err()
}
// TokenCount is a utility that uses a bufio.SplitFunc to count the number of
// recognized tokens in the given buffer.
func TokenCount(buf []byte, splitFunc bufio.SplitFunc) (int, error) {
scanner := bufio.NewScanner(bytes.NewReader(buf))
scanner.Split(splitFunc)
var ret int
for scanner.Scan() {
ret++
}
return ret, scanner.Err()
}

View File

@ -0,0 +1,290 @@
# The following Ragel file was autogenerated with unicode2ragel.rb
# from: https://www.unicode.org/Public/emoji/12.0/emoji-data.txt
#
# It defines ["Extended_Pictographic"].
#
# To use this, make sure that your alphtype is set to byte,
# and that your input is in utf8.
%%{
machine Emoji;
Extended_Pictographic =
0xC2 0xA9 #1.1 [1] (©️) copyright
| 0xC2 0xAE #1.1 [1] (®️) registered
| 0xE2 0x80 0xBC #1.1 [1] (‼️) double exclamation mark
| 0xE2 0x81 0x89 #3.0 [1] (⁉️) exclamation question mark
| 0xE2 0x84 0xA2 #1.1 [1] (™️) trade mark
| 0xE2 0x84 0xB9 #3.0 [1] () information
| 0xE2 0x86 0x94..0x99 #1.1 [6] (↔️..↙️) left-right arrow..down...
| 0xE2 0x86 0xA9..0xAA #1.1 [2] (↩️..↪️) right arrow curving le...
| 0xE2 0x8C 0x9A..0x9B #1.1 [2] (⌚..⌛) watch..hourglass done
| 0xE2 0x8C 0xA8 #1.1 [1] (⌨️) keyboard
| 0xE2 0x8E 0x88 #3.0 [1] (⎈) HELM SYMBOL
| 0xE2 0x8F 0x8F #4.0 [1] (⏏️) eject button
| 0xE2 0x8F 0xA9..0xB3 #6.0 [11] (⏩..⏳) fast-forward button..hou...
| 0xE2 0x8F 0xB8..0xBA #7.0 [3] (⏸️..⏺️) pause button..record b...
| 0xE2 0x93 0x82 #1.1 [1] (Ⓜ️) circled M
| 0xE2 0x96 0xAA..0xAB #1.1 [2] (▪️..▫️) black small square..wh...
| 0xE2 0x96 0xB6 #1.1 [1] (▶️) play button
| 0xE2 0x97 0x80 #1.1 [1] (◀️) reverse button
| 0xE2 0x97 0xBB..0xBE #3.2 [4] (◻️..◾) white medium square..bl...
| 0xE2 0x98 0x80..0x85 #1.1 [6] (☀️..★) sun..BLACK STAR
| 0xE2 0x98 0x87..0x92 #1.1 [12] (☇..☒) LIGHTNING..BALLOT BOX WI...
| 0xE2 0x98 0x94..0x95 #4.0 [2] (☔..☕) umbrella with rain drops...
| 0xE2 0x98 0x96..0x97 #3.2 [2] (☖..☗) WHITE SHOGI PIECE..BLACK...
| 0xE2 0x98 0x98 #4.1 [1] (☘️) shamrock
| 0xE2 0x98 0x99 #3.0 [1] (☙) REVERSED ROTATED FLORAL ...
| 0xE2 0x98 0x9A..0xFF #1.1 [86] (☚..♯) BLACK LEFT POINTING INDE...
| 0xE2 0x99 0x00..0xAF #
| 0xE2 0x99 0xB0..0xB1 #3.0 [2] (♰..♱) WEST SYRIAC CROSS..EAST ...
| 0xE2 0x99 0xB2..0xBD #3.2 [12] (♲..♽) UNIVERSAL RECYCLING SYMB...
| 0xE2 0x99 0xBE..0xBF #4.1 [2] (♾️..♿) infinity..wheelchair sy...
| 0xE2 0x9A 0x80..0x85 #3.2 [6] (⚀..⚅) DIE FACE-1..DIE FACE-6
| 0xE2 0x9A 0x90..0x91 #4.0 [2] (⚐..⚑) WHITE FLAG..BLACK FLAG
| 0xE2 0x9A 0x92..0x9C #4.1 [11] (⚒️..⚜️) hammer and pick..fleur...
| 0xE2 0x9A 0x9D #5.1 [1] (⚝) OUTLINED WHITE STAR
| 0xE2 0x9A 0x9E..0x9F #5.2 [2] (⚞..⚟) THREE LINES CONVERGING R...
| 0xE2 0x9A 0xA0..0xA1 #4.0 [2] (⚠️..⚡) warning..high voltage
| 0xE2 0x9A 0xA2..0xB1 #4.1 [16] (⚢..⚱️) DOUBLED FEMALE SIGN..fu...
| 0xE2 0x9A 0xB2 #5.0 [1] (⚲) NEUTER
| 0xE2 0x9A 0xB3..0xBC #5.1 [10] (⚳..⚼) CERES..SESQUIQUADRATE
| 0xE2 0x9A 0xBD..0xBF #5.2 [3] (⚽..⚿) soccer ball..SQUARED KEY
| 0xE2 0x9B 0x80..0x83 #5.1 [4] (⛀..⛃) WHITE DRAUGHTS MAN..BLAC...
| 0xE2 0x9B 0x84..0x8D #5.2 [10] (⛄..⛍) snowman without snow..DI...
| 0xE2 0x9B 0x8E #6.0 [1] (⛎) Ophiuchus
| 0xE2 0x9B 0x8F..0xA1 #5.2 [19] (⛏️..⛡) pick..RESTRICTED LEFT E...
| 0xE2 0x9B 0xA2 #6.0 [1] (⛢) ASTRONOMICAL SYMBOL FOR ...
| 0xE2 0x9B 0xA3 #5.2 [1] (⛣) HEAVY CIRCLE WITH STROKE...
| 0xE2 0x9B 0xA4..0xA7 #6.0 [4] (⛤..⛧) PENTAGRAM..INVERTED PENT...
| 0xE2 0x9B 0xA8..0xBF #5.2 [24] (⛨..⛿) BLACK CROSS ON SHIELD..W...
| 0xE2 0x9C 0x80 #7.0 [1] (✀) BLACK SAFETY SCISSORS
| 0xE2 0x9C 0x81..0x84 #1.1 [4] (✁..✄) UPPER BLADE SCISSORS..WH...
| 0xE2 0x9C 0x85 #6.0 [1] (✅) check mark button
| 0xE2 0x9C 0x88..0x89 #1.1 [2] (✈️..✉️) airplane..envelope
| 0xE2 0x9C 0x8A..0x8B #6.0 [2] (✊..✋) raised fist..raised hand
| 0xE2 0x9C 0x8C..0x92 #1.1 [7] (✌️..✒️) victory hand..black nib
| 0xE2 0x9C 0x94 #1.1 [1] (✔️) check mark
| 0xE2 0x9C 0x96 #1.1 [1] (✖️) multiplication sign
| 0xE2 0x9C 0x9D #1.1 [1] (✝️) latin cross
| 0xE2 0x9C 0xA1 #1.1 [1] (✡️) star of David
| 0xE2 0x9C 0xA8 #6.0 [1] (✨) sparkles
| 0xE2 0x9C 0xB3..0xB4 #1.1 [2] (✳️..✴️) eight-spoked asterisk....
| 0xE2 0x9D 0x84 #1.1 [1] (❄️) snowflake
| 0xE2 0x9D 0x87 #1.1 [1] (❇️) sparkle
| 0xE2 0x9D 0x8C #6.0 [1] (❌) cross mark
| 0xE2 0x9D 0x8E #6.0 [1] (❎) cross mark button
| 0xE2 0x9D 0x93..0x95 #6.0 [3] (❓..❕) question mark..white exc...
| 0xE2 0x9D 0x97 #5.2 [1] (❗) exclamation mark
| 0xE2 0x9D 0xA3..0xA7 #1.1 [5] (❣️..❧) heart exclamation..ROTA...
| 0xE2 0x9E 0x95..0x97 #6.0 [3] (..➗) plus sign..division sign
| 0xE2 0x9E 0xA1 #1.1 [1] (➡️) right arrow
| 0xE2 0x9E 0xB0 #6.0 [1] (➰) curly loop
| 0xE2 0x9E 0xBF #6.0 [1] (➿) double curly loop
| 0xE2 0xA4 0xB4..0xB5 #3.2 [2] (⤴️..⤵️) right arrow curving up...
| 0xE2 0xAC 0x85..0x87 #4.0 [3] (⬅️..⬇️) left arrow..down arrow
| 0xE2 0xAC 0x9B..0x9C #5.1 [2] (⬛..⬜) black large square..whit...
| 0xE2 0xAD 0x90 #5.1 [1] (⭐) star
| 0xE2 0xAD 0x95 #5.2 [1] (⭕) hollow red circle
| 0xE3 0x80 0xB0 #1.1 [1] (〰️) wavy dash
| 0xE3 0x80 0xBD #3.2 [1] (〽️) part alternation mark
| 0xE3 0x8A 0x97 #1.1 [1] (㊗️) Japanese “congratulatio...
| 0xE3 0x8A 0x99 #1.1 [1] (㊙️) Japanese “secret” button
| 0xF0 0x9F 0x80 0x80..0xAB #5.1 [44] (🀀..🀫) MAHJONG TILE EAST WIN...
| 0xF0 0x9F 0x80 0xAC..0xAF #NA [4] (🀬..🀯) <reserved-1F02C>..<res...
| 0xF0 0x9F 0x80 0xB0..0xFF #5.1[100] (🀰..🂓) DOMINO TILE HOR...
| 0xF0 0x9F 0x81..0x81 0x00..0xFF #
| 0xF0 0x9F 0x82 0x00..0x93 #
| 0xF0 0x9F 0x82 0x94..0x9F #NA [12] (🂔..🂟) <reserved-1F094>..<res...
| 0xF0 0x9F 0x82 0xA0..0xAE #6.0 [15] (🂠..🂮) PLAYING CARD BACK..PL...
| 0xF0 0x9F 0x82 0xAF..0xB0 #NA [2] (🂯..🂰) <reserved-1F0AF>..<res...
| 0xF0 0x9F 0x82 0xB1..0xBE #6.0 [14] (🂱..🂾) PLAYING CARD ACE OF H...
| 0xF0 0x9F 0x82 0xBF #7.0 [1] (🂿) PLAYING CARD RED JOKER
| 0xF0 0x9F 0x83 0x80 #NA [1] (🃀) <reserved-1F0C0>
| 0xF0 0x9F 0x83 0x81..0x8F #6.0 [15] (🃁..🃏) PLAYING CARD ACE OF D...
| 0xF0 0x9F 0x83 0x90 #NA [1] (🃐) <reserved-1F0D0>
| 0xF0 0x9F 0x83 0x91..0x9F #6.0 [15] (🃑..🃟) PLAYING CARD ACE OF C...
| 0xF0 0x9F 0x83 0xA0..0xB5 #7.0 [22] (🃠..🃵) PLAYING CARD FOOL..PL...
| 0xF0 0x9F 0x83 0xB6..0xBF #NA [10] (🃶..🃿) <reserved-1F0F6>..<res...
| 0xF0 0x9F 0x84 0x8D..0x8F #NA [3] (🄍..🄏) <reserved-1F10D>..<res...
| 0xF0 0x9F 0x84 0xAF #11.0 [1] (🄯) COPYLEFT SYMBOL
| 0xF0 0x9F 0x85 0xAC #12.0 [1] (🅬) RAISED MR SIGN
| 0xF0 0x9F 0x85 0xAD..0xAF #NA [3] (🅭..🅯) <reserved-1F16D>..<res...
| 0xF0 0x9F 0x85 0xB0..0xB1 #6.0 [2] (🅰️..🅱️) A button (blood typ...
| 0xF0 0x9F 0x85 0xBE #6.0 [1] (🅾️) O button (blood type)
| 0xF0 0x9F 0x85 0xBF #5.2 [1] (🅿️) P button
| 0xF0 0x9F 0x86 0x8E #6.0 [1] (🆎) AB button (blood type)
| 0xF0 0x9F 0x86 0x91..0x9A #6.0 [10] (🆑..🆚) CL button..VS button
| 0xF0 0x9F 0x86 0xAD..0xFF #NA [57] (🆭..🇥) <reserved-1F1AD>..<res...
| 0xF0 0x9F 0x87 0x00..0xA5 #
| 0xF0 0x9F 0x88 0x81..0x82 #6.0 [2] (🈁..🈂️) Japanese “here” butt...
| 0xF0 0x9F 0x88 0x83..0x8F #NA [13] (🈃..🈏) <reserved-1F203>..<res...
| 0xF0 0x9F 0x88 0x9A #5.2 [1] (🈚) Japanese “free of charge...
| 0xF0 0x9F 0x88 0xAF #5.2 [1] (🈯) Japanese “reserved” button
| 0xF0 0x9F 0x88 0xB2..0xBA #6.0 [9] (🈲..🈺) Japanese “prohibited”...
| 0xF0 0x9F 0x88 0xBC..0xBF #NA [4] (🈼..🈿) <reserved-1F23C>..<res...
| 0xF0 0x9F 0x89 0x89..0x8F #NA [7] (🉉..🉏) <reserved-1F249>..<res...
| 0xF0 0x9F 0x89 0x90..0x91 #6.0 [2] (🉐..🉑) Japanese “bargain” bu...
| 0xF0 0x9F 0x89 0x92..0x9F #NA [14] (🉒..🉟) <reserved-1F252>..<res...
| 0xF0 0x9F 0x89 0xA0..0xA5 #10.0 [6] (🉠..🉥) ROUNDED SYMBOL FOR F...
| 0xF0 0x9F 0x89 0xA6..0xFF #NA[154] (🉦..🋿) <reserved-1F266>...
| 0xF0 0x9F 0x8A..0x8A 0x00..0xFF #
| 0xF0 0x9F 0x8B 0x00..0xBF #
| 0xF0 0x9F 0x8C 0x80..0xA0 #6.0 [33] (🌀..🌠) cyclone..shooting star
| 0xF0 0x9F 0x8C 0xA1..0xAC #7.0 [12] (🌡️..🌬️) thermometer..wind face
| 0xF0 0x9F 0x8C 0xAD..0xAF #8.0 [3] (🌭..🌯) hot dog..burrito
| 0xF0 0x9F 0x8C 0xB0..0xB5 #6.0 [6] (🌰..🌵) chestnut..cactus
| 0xF0 0x9F 0x8C 0xB6 #7.0 [1] (🌶️) hot pepper
| 0xF0 0x9F 0x8C 0xB7..0xFF #6.0 [70] (🌷..🍼) tulip..baby bottle
| 0xF0 0x9F 0x8D 0x00..0xBC #
| 0xF0 0x9F 0x8D 0xBD #7.0 [1] (🍽️) fork and knife with plate
| 0xF0 0x9F 0x8D 0xBE..0xBF #8.0 [2] (🍾..🍿) bottle with popping c...
| 0xF0 0x9F 0x8E 0x80..0x93 #6.0 [20] (🎀..🎓) ribbon..graduation cap
| 0xF0 0x9F 0x8E 0x94..0x9F #7.0 [12] (🎔..🎟️) HEART WITH TIP ON TH...
| 0xF0 0x9F 0x8E 0xA0..0xFF #6.0 [37] (🎠..🏄) carousel horse..perso...
| 0xF0 0x9F 0x8F 0x00..0x84 #
| 0xF0 0x9F 0x8F 0x85 #7.0 [1] (🏅) sports medal
| 0xF0 0x9F 0x8F 0x86..0x8A #6.0 [5] (🏆..🏊) trophy..person swimming
| 0xF0 0x9F 0x8F 0x8B..0x8E #7.0 [4] (🏋️..🏎️) person lifting weig...
| 0xF0 0x9F 0x8F 0x8F..0x93 #8.0 [5] (🏏..🏓) cricket game..ping pong
| 0xF0 0x9F 0x8F 0x94..0x9F #7.0 [12] (🏔️..🏟️) snow-capped mountai...
| 0xF0 0x9F 0x8F 0xA0..0xB0 #6.0 [17] (🏠..🏰) house..castle
| 0xF0 0x9F 0x8F 0xB1..0xB7 #7.0 [7] (🏱..🏷️) WHITE PENNANT..label
| 0xF0 0x9F 0x8F 0xB8..0xBA #8.0 [3] (🏸..🏺) badminton..amphora
| 0xF0 0x9F 0x90 0x80..0xBE #6.0 [63] (🐀..🐾) rat..paw prints
| 0xF0 0x9F 0x90 0xBF #7.0 [1] (🐿️) chipmunk
| 0xF0 0x9F 0x91 0x80 #6.0 [1] (👀) eyes
| 0xF0 0x9F 0x91 0x81 #7.0 [1] (👁️) eye
| 0xF0 0x9F 0x91 0x82..0xFF #6.0[182] (👂..📷) ear..camera
| 0xF0 0x9F 0x92..0x92 0x00..0xFF #
| 0xF0 0x9F 0x93 0x00..0xB7 #
| 0xF0 0x9F 0x93 0xB8 #7.0 [1] (📸) camera with flash
| 0xF0 0x9F 0x93 0xB9..0xBC #6.0 [4] (📹..📼) video camera..videoca...
| 0xF0 0x9F 0x93 0xBD..0xBE #7.0 [2] (📽️..📾) film projector..PORT...
| 0xF0 0x9F 0x93 0xBF #8.0 [1] (📿) prayer beads
| 0xF0 0x9F 0x94 0x80..0xBD #6.0 [62] (🔀..🔽) shuffle tracks button...
| 0xF0 0x9F 0x95 0x86..0x8A #7.0 [5] (🕆..🕊️) WHITE LATIN CROSS..dove
| 0xF0 0x9F 0x95 0x8B..0x8F #8.0 [5] (🕋..🕏) kaaba..BOWL OF HYGIEIA
| 0xF0 0x9F 0x95 0x90..0xA7 #6.0 [24] (🕐..🕧) one oclock..twelve-t...
| 0xF0 0x9F 0x95 0xA8..0xB9 #7.0 [18] (🕨..🕹️) RIGHT SPEAKER..joystick
| 0xF0 0x9F 0x95 0xBA #9.0 [1] (🕺) man dancing
| 0xF0 0x9F 0x95 0xBB..0xFF #7.0 [41] (🕻..🖣) LEFT HAND TELEPHONE R...
| 0xF0 0x9F 0x96 0x00..0xA3 #
| 0xF0 0x9F 0x96 0xA4 #9.0 [1] (🖤) black heart
| 0xF0 0x9F 0x96 0xA5..0xFF #7.0 [86] (🖥️..🗺️) desktop computer..w...
| 0xF0 0x9F 0x97 0x00..0xBA #
| 0xF0 0x9F 0x97 0xBB..0xBF #6.0 [5] (🗻..🗿) mount fuji..moai
| 0xF0 0x9F 0x98 0x80 #6.1 [1] (😀) grinning face
| 0xF0 0x9F 0x98 0x81..0x90 #6.0 [16] (😁..😐) beaming face with smi...
| 0xF0 0x9F 0x98 0x91 #6.1 [1] (😑) expressionless face
| 0xF0 0x9F 0x98 0x92..0x94 #6.0 [3] (😒..😔) unamused face..pensiv...
| 0xF0 0x9F 0x98 0x95 #6.1 [1] (😕) confused face
| 0xF0 0x9F 0x98 0x96 #6.0 [1] (😖) confounded face
| 0xF0 0x9F 0x98 0x97 #6.1 [1] (😗) kissing face
| 0xF0 0x9F 0x98 0x98 #6.0 [1] (😘) face blowing a kiss
| 0xF0 0x9F 0x98 0x99 #6.1 [1] (😙) kissing face with smilin...
| 0xF0 0x9F 0x98 0x9A #6.0 [1] (😚) kissing face with closed...
| 0xF0 0x9F 0x98 0x9B #6.1 [1] (😛) face with tongue
| 0xF0 0x9F 0x98 0x9C..0x9E #6.0 [3] (😜..😞) winking face with ton...
| 0xF0 0x9F 0x98 0x9F #6.1 [1] (😟) worried face
| 0xF0 0x9F 0x98 0xA0..0xA5 #6.0 [6] (😠..😥) angry face..sad but r...
| 0xF0 0x9F 0x98 0xA6..0xA7 #6.1 [2] (😦..😧) frowning face with op...
| 0xF0 0x9F 0x98 0xA8..0xAB #6.0 [4] (😨..😫) fearful face..tired face
| 0xF0 0x9F 0x98 0xAC #6.1 [1] (😬) grimacing face
| 0xF0 0x9F 0x98 0xAD #6.0 [1] (😭) loudly crying face
| 0xF0 0x9F 0x98 0xAE..0xAF #6.1 [2] (😮..😯) face with open mouth....
| 0xF0 0x9F 0x98 0xB0..0xB3 #6.0 [4] (😰..😳) anxious face with swe...
| 0xF0 0x9F 0x98 0xB4 #6.1 [1] (😴) sleeping face
| 0xF0 0x9F 0x98 0xB5..0xFF #6.0 [12] (😵..🙀) dizzy face..weary cat
| 0xF0 0x9F 0x99 0x00..0x80 #
| 0xF0 0x9F 0x99 0x81..0x82 #7.0 [2] (🙁..🙂) slightly frowning fac...
| 0xF0 0x9F 0x99 0x83..0x84 #8.0 [2] (🙃..🙄) upside-down face..fac...
| 0xF0 0x9F 0x99 0x85..0x8F #6.0 [11] (🙅..🙏) person gesturing NO.....
| 0xF0 0x9F 0x9A 0x80..0xFF #6.0 [70] (🚀..🛅) rocket..left luggage
| 0xF0 0x9F 0x9B 0x00..0x85 #
| 0xF0 0x9F 0x9B 0x86..0x8F #7.0 [10] (🛆..🛏️) TRIANGLE WITH ROUNDE...
| 0xF0 0x9F 0x9B 0x90 #8.0 [1] (🛐) place of worship
| 0xF0 0x9F 0x9B 0x91..0x92 #9.0 [2] (🛑..🛒) stop sign..shopping cart
| 0xF0 0x9F 0x9B 0x93..0x94 #10.0 [2] (🛓..🛔) STUPA..PAGODA
| 0xF0 0x9F 0x9B 0x95 #12.0 [1] (🛕) hindu temple
| 0xF0 0x9F 0x9B 0x96..0x9F #NA [10] (🛖..🛟) <reserved-1F6D6>..<res...
| 0xF0 0x9F 0x9B 0xA0..0xAC #7.0 [13] (🛠️..🛬) hammer and wrench..a...
| 0xF0 0x9F 0x9B 0xAD..0xAF #NA [3] (🛭..🛯) <reserved-1F6ED>..<res...
| 0xF0 0x9F 0x9B 0xB0..0xB3 #7.0 [4] (🛰️..🛳️) satellite..passenge...
| 0xF0 0x9F 0x9B 0xB4..0xB6 #9.0 [3] (🛴..🛶) kick scooter..canoe
| 0xF0 0x9F 0x9B 0xB7..0xB8 #10.0 [2] (🛷..🛸) sled..flying saucer
| 0xF0 0x9F 0x9B 0xB9 #11.0 [1] (🛹) skateboard
| 0xF0 0x9F 0x9B 0xBA #12.0 [1] (🛺) auto rickshaw
| 0xF0 0x9F 0x9B 0xBB..0xBF #NA [5] (🛻..🛿) <reserved-1F6FB>..<res...
| 0xF0 0x9F 0x9D 0xB4..0xBF #NA [12] (🝴..🝿) <reserved-1F774>..<res...
| 0xF0 0x9F 0x9F 0x95..0x98 #11.0 [4] (🟕..🟘) CIRCLED TRIANGLE..NE...
| 0xF0 0x9F 0x9F 0x99..0x9F #NA [7] (🟙..🟟) <reserved-1F7D9>..<res...
| 0xF0 0x9F 0x9F 0xA0..0xAB #12.0 [12] (🟠..🟫) orange circle..brown...
| 0xF0 0x9F 0x9F 0xAC..0xBF #NA [20] (🟬..🟿) <reserved-1F7EC>..<res...
| 0xF0 0x9F 0xA0 0x8C..0x8F #NA [4] (🠌..🠏) <reserved-1F80C>..<res...
| 0xF0 0x9F 0xA1 0x88..0x8F #NA [8] (🡈..🡏) <reserved-1F848>..<res...
| 0xF0 0x9F 0xA1 0x9A..0x9F #NA [6] (🡚..🡟) <reserved-1F85A>..<res...
| 0xF0 0x9F 0xA2 0x88..0x8F #NA [8] (🢈..🢏) <reserved-1F888>..<res...
| 0xF0 0x9F 0xA2 0xAE..0xFF #NA [82] (🢮..🣿) <reserved-1F8AE>..<res...
| 0xF0 0x9F 0xA3 0x00..0xBF #
| 0xF0 0x9F 0xA4 0x8C #NA [1] (🤌) <reserved-1F90C>
| 0xF0 0x9F 0xA4 0x8D..0x8F #12.0 [3] (🤍..🤏) white heart..pinchin...
| 0xF0 0x9F 0xA4 0x90..0x98 #8.0 [9] (🤐..🤘) zipper-mouth face..si...
| 0xF0 0x9F 0xA4 0x99..0x9E #9.0 [6] (🤙..🤞) call me hand..crossed...
| 0xF0 0x9F 0xA4 0x9F #10.0 [1] (🤟) love-you gesture
| 0xF0 0x9F 0xA4 0xA0..0xA7 #9.0 [8] (🤠..🤧) cowboy hat face..snee...
| 0xF0 0x9F 0xA4 0xA8..0xAF #10.0 [8] (🤨..🤯) face with raised eye...
| 0xF0 0x9F 0xA4 0xB0 #9.0 [1] (🤰) pregnant woman
| 0xF0 0x9F 0xA4 0xB1..0xB2 #10.0 [2] (🤱..🤲) breast-feeding..palm...
| 0xF0 0x9F 0xA4 0xB3..0xBA #9.0 [8] (🤳..🤺) selfie..person fencing
| 0xF0 0x9F 0xA4 0xBC..0xBE #9.0 [3] (🤼..🤾) people wrestling..per...
| 0xF0 0x9F 0xA4 0xBF #12.0 [1] (🤿) diving mask
| 0xF0 0x9F 0xA5 0x80..0x85 #9.0 [6] (🥀..🥅) wilted flower..goal net
| 0xF0 0x9F 0xA5 0x87..0x8B #9.0 [5] (🥇..🥋) 1st place medal..mart...
| 0xF0 0x9F 0xA5 0x8C #10.0 [1] (🥌) curling stone
| 0xF0 0x9F 0xA5 0x8D..0x8F #11.0 [3] (🥍..🥏) lacrosse..flying disc
| 0xF0 0x9F 0xA5 0x90..0x9E #9.0 [15] (🥐..🥞) croissant..pancakes
| 0xF0 0x9F 0xA5 0x9F..0xAB #10.0 [13] (🥟..🥫) dumpling..canned food
| 0xF0 0x9F 0xA5 0xAC..0xB0 #11.0 [5] (🥬..🥰) leafy green..smiling...
| 0xF0 0x9F 0xA5 0xB1 #12.0 [1] (🥱) yawning face
| 0xF0 0x9F 0xA5 0xB2 #NA [1] (🥲) <reserved-1F972>
| 0xF0 0x9F 0xA5 0xB3..0xB6 #11.0 [4] (🥳..🥶) partying face..cold ...
| 0xF0 0x9F 0xA5 0xB7..0xB9 #NA [3] (🥷..🥹) <reserved-1F977>..<res...
| 0xF0 0x9F 0xA5 0xBA #11.0 [1] (🥺) pleading face
| 0xF0 0x9F 0xA5 0xBB #12.0 [1] (🥻) sari
| 0xF0 0x9F 0xA5 0xBC..0xBF #11.0 [4] (🥼..🥿) lab coat..flat shoe
| 0xF0 0x9F 0xA6 0x80..0x84 #8.0 [5] (🦀..🦄) crab..unicorn
| 0xF0 0x9F 0xA6 0x85..0x91 #9.0 [13] (🦅..🦑) eagle..squid
| 0xF0 0x9F 0xA6 0x92..0x97 #10.0 [6] (🦒..🦗) giraffe..cricket
| 0xF0 0x9F 0xA6 0x98..0xA2 #11.0 [11] (🦘..🦢) kangaroo..swan
| 0xF0 0x9F 0xA6 0xA3..0xA4 #NA [2] (🦣..🦤) <reserved-1F9A3>..<res...
| 0xF0 0x9F 0xA6 0xA5..0xAA #12.0 [6] (🦥..🦪) sloth..oyster
| 0xF0 0x9F 0xA6 0xAB..0xAD #NA [3] (🦫..🦭) <reserved-1F9AB>..<res...
| 0xF0 0x9F 0xA6 0xAE..0xAF #12.0 [2] (🦮..🦯) guide dog..probing cane
| 0xF0 0x9F 0xA6 0xB0..0xB9 #11.0 [10] (🦰..🦹) red hair..supervillain
| 0xF0 0x9F 0xA6 0xBA..0xBF #12.0 [6] (🦺..🦿) safety vest..mechani...
| 0xF0 0x9F 0xA7 0x80 #8.0 [1] (🧀) cheese wedge
| 0xF0 0x9F 0xA7 0x81..0x82 #11.0 [2] (🧁..🧂) cupcake..salt
| 0xF0 0x9F 0xA7 0x83..0x8A #12.0 [8] (🧃..🧊) beverage box..ice cube
| 0xF0 0x9F 0xA7 0x8B..0x8C #NA [2] (🧋..🧌) <reserved-1F9CB>..<res...
| 0xF0 0x9F 0xA7 0x8D..0x8F #12.0 [3] (🧍..🧏) person standing..dea...
| 0xF0 0x9F 0xA7 0x90..0xA6 #10.0 [23] (🧐..🧦) face with monocle..s...
| 0xF0 0x9F 0xA7 0xA7..0xBF #11.0 [25] (🧧..🧿) red envelope..nazar ...
| 0xF0 0x9F 0xA8 0x80..0xFF #12.0 [84] (🨀..🩓) NEUTRAL CHESS KING.....
| 0xF0 0x9F 0xA9 0x00..0x93 #
| 0xF0 0x9F 0xA9 0x94..0x9F #NA [12] (🩔..🩟) <reserved-1FA54>..<res...
| 0xF0 0x9F 0xA9 0xA0..0xAD #11.0 [14] (🩠..🩭) XIANGQI RED GENERAL....
| 0xF0 0x9F 0xA9 0xAE..0xAF #NA [2] (🩮..🩯) <reserved-1FA6E>..<res...
| 0xF0 0x9F 0xA9 0xB0..0xB3 #12.0 [4] (🩰..🩳) ballet shoes..shorts
| 0xF0 0x9F 0xA9 0xB4..0xB7 #NA [4] (🩴..🩷) <reserved-1FA74>..<res...
| 0xF0 0x9F 0xA9 0xB8..0xBA #12.0 [3] (🩸..🩺) drop of blood..steth...
| 0xF0 0x9F 0xA9 0xBB..0xBF #NA [5] (🩻..🩿) <reserved-1FA7B>..<res...
| 0xF0 0x9F 0xAA 0x80..0x82 #12.0 [3] (🪀..🪂) yo-yo..parachute
| 0xF0 0x9F 0xAA 0x83..0x8F #NA [13] (🪃..🪏) <reserved-1FA83>..<res...
| 0xF0 0x9F 0xAA 0x90..0x95 #12.0 [6] (🪐..🪕) ringed planet..banjo
| 0xF0 0x9F 0xAA 0x96..0xFF #NA[1384] (🪖..🿽) <reserved-1FA96>...
| 0xF0 0x9F 0xAB..0xBE 0x00..0xFF #
| 0xF0 0x9F 0xBF 0x00..0xBD #
;
}%%

View File

@ -0,0 +1,8 @@
package textseg
//go:generate go run make_tables.go -output tables.go
//go:generate go run make_test_tables.go -output tables_test.go
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/12.0.0/ucd/auxiliary/GraphemeBreakProperty.txt -m GraphemeCluster -p "Prepend,CR,LF,Control,Extend,Regional_Indicator,SpacingMark,L,V,T,LV,LVT,ZWJ" -o grapheme_clusters_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/emoji/12.0/emoji-data.txt -m Emoji -p "Extended_Pictographic" -o emoji_table.rl
//go:generate ragel -Z grapheme_clusters.rl
//go:generate gofmt -w grapheme_clusters.go

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,133 @@
package textseg
import (
"errors"
"unicode/utf8"
)
// Generated from grapheme_clusters.rl. DO NOT EDIT
%%{
# (except you are actually in grapheme_clusters.rl here, so edit away!)
machine graphclust;
write data;
}%%
var Error = errors.New("invalid UTF8 text")
// ScanGraphemeClusters is a split function for bufio.Scanner that splits
// on grapheme cluster boundaries.
func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {
if len(data) == 0 {
return 0, nil, nil
}
// Ragel state
cs := 0 // Current State
p := 0 // "Pointer" into data
pe := len(data) // End-of-data "pointer"
ts := 0
te := 0
act := 0
eof := pe
// Make Go compiler happy
_ = ts
_ = te
_ = act
_ = eof
startPos := 0
endPos := 0
%%{
include GraphemeCluster "grapheme_clusters_table.rl";
include Emoji "emoji_table.rl";
action start {
startPos = p
}
action end {
endPos = p
}
action emit {
return endPos+1, data[startPos:endPos+1], nil
}
ZWJGlue = ZWJ (Extended_Pictographic Extend*)?;
AnyExtender = Extend | ZWJGlue | SpacingMark;
Extension = AnyExtender*;
ReplacementChar = (0xEF 0xBF 0xBD);
CRLFSeq = CR LF;
ControlSeq = Control | ReplacementChar;
HangulSeq = (
L+ (((LV? V+ | LVT) T*)?|LV?) |
LV V* T* |
V+ T* |
LVT T* |
T+
) Extension;
EmojiSeq = Extended_Pictographic Extend* Extension;
ZWJSeq = ZWJ (ZWJ | Extend | SpacingMark)*;
EmojiFlagSeq = Regional_Indicator Regional_Indicator? Extension;
UTF8Cont = 0x80 .. 0xBF;
AnyUTF8 = (
0x00..0x7F |
0xC0..0xDF . UTF8Cont |
0xE0..0xEF . UTF8Cont . UTF8Cont |
0xF0..0xF7 . UTF8Cont . UTF8Cont . UTF8Cont
);
# OtherSeq is any character that isn't at the start of one of the extended sequences above, followed by extension
OtherSeq = (AnyUTF8 - (CR|LF|Control|ReplacementChar|L|LV|V|LVT|T|Extended_Pictographic|ZWJ|Regional_Indicator|Prepend)) (Extend | ZWJ | SpacingMark)*;
# PrependSeq is prepend followed by any of the other patterns above, except control characters which explicitly break
PrependSeq = Prepend+ (HangulSeq|EmojiSeq|ZWJSeq|EmojiFlagSeq|OtherSeq)?;
CRLFTok = CRLFSeq >start @end;
ControlTok = ControlSeq >start @end;
HangulTok = HangulSeq >start @end;
EmojiTok = EmojiSeq >start @end;
ZWJTok = ZWJSeq >start @end;
EmojiFlagTok = EmojiFlagSeq >start @end;
OtherTok = OtherSeq >start @end;
PrependTok = PrependSeq >start @end;
main := |*
CRLFTok => emit;
ControlTok => emit;
HangulTok => emit;
EmojiTok => emit;
ZWJTok => emit;
EmojiFlagTok => emit;
PrependTok => emit;
OtherTok => emit;
# any single valid UTF-8 character would also be valid per spec,
# but we'll handle that separately after the loop so we can deal
# with requesting more bytes if we're not at EOF.
*|;
write init;
write exec;
}%%
// If we fall out here then we were unable to complete a sequence.
// If we weren't able to complete a sequence then either we've
// reached the end of a partial buffer (so there's more data to come)
// or we have an isolated symbol that would normally be part of a
// grapheme cluster but has appeared in isolation here.
if !atEOF {
// Request more
return 0, nil, nil
}
// Just take the first UTF-8 sequence and return that.
_, seqLen := utf8.DecodeRune(data)
return seqLen, data[:seqLen], nil
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,335 @@
#!/usr/bin/env ruby
#
# This scripted has been updated to accept more command-line arguments:
#
# -u, --url URL to process
# -m, --machine Machine name
# -p, --properties Properties to add to the machine
# -o, --output Write output to file
#
# Updated by: Marty Schoch <marty.schoch@gmail.com>
#
# This script uses the unicode spec to generate a Ragel state machine
# that recognizes unicode alphanumeric characters. It generates 5
# character classes: uupper, ulower, ualpha, udigit, and ualnum.
# Currently supported encodings are UTF-8 [default] and UCS-4.
#
# Usage: unicode2ragel.rb [options]
# -e, --encoding [ucs4 | utf8] Data encoding
# -h, --help Show this message
#
# This script was originally written as part of the Ferret search
# engine library.
#
# Author: Rakan El-Khalil <rakan@well.com>
require 'optparse'
require 'open-uri'
ENCODINGS = [ :utf8, :ucs4 ]
ALPHTYPES = { :utf8 => "byte", :ucs4 => "rune" }
DEFAULT_CHART_URL = "http://www.unicode.org/Public/5.1.0/ucd/DerivedCoreProperties.txt"
DEFAULT_MACHINE_NAME= "WChar"
###
# Display vars & default option
TOTAL_WIDTH = 80
RANGE_WIDTH = 23
@encoding = :utf8
@chart_url = DEFAULT_CHART_URL
machine_name = DEFAULT_MACHINE_NAME
properties = []
@output = $stdout
###
# Option parsing
cli_opts = OptionParser.new do |opts|
opts.on("-e", "--encoding [ucs4 | utf8]", "Data encoding") do |o|
@encoding = o.downcase.to_sym
end
opts.on("-h", "--help", "Show this message") do
puts opts
exit
end
opts.on("-u", "--url URL", "URL to process") do |o|
@chart_url = o
end
opts.on("-m", "--machine MACHINE_NAME", "Machine name") do |o|
machine_name = o
end
opts.on("-p", "--properties x,y,z", Array, "Properties to add to machine") do |o|
properties = o
end
opts.on("-o", "--output FILE", "output file") do |o|
@output = File.new(o, "w+")
end
end
cli_opts.parse(ARGV)
unless ENCODINGS.member? @encoding
puts "Invalid encoding: #{@encoding}"
puts cli_opts
exit
end
##
# Downloads the document at url and yields every alpha line's hex
# range and description.
def each_alpha( url, property )
open( url ) do |file|
file.each_line do |line|
next if line =~ /^#/;
next if line !~ /; #{property} *#/;
range, description = line.split(/;/)
range.strip!
description.gsub!(/.*#/, '').strip!
if range =~ /\.\./
start, stop = range.split '..'
else start = stop = range
end
yield start.hex .. stop.hex, description
end
end
end
###
# Formats to hex at minimum width
def to_hex( n )
r = "%0X" % n
r = "0#{r}" unless (r.length % 2).zero?
r
end
###
# UCS4 is just a straight hex conversion of the unicode codepoint.
def to_ucs4( range )
rangestr = "0x" + to_hex(range.begin)
rangestr << "..0x" + to_hex(range.end) if range.begin != range.end
[ rangestr ]
end
##
# 0x00 - 0x7f -> 0zzzzzzz[7]
# 0x80 - 0x7ff -> 110yyyyy[5] 10zzzzzz[6]
# 0x800 - 0xffff -> 1110xxxx[4] 10yyyyyy[6] 10zzzzzz[6]
# 0x010000 - 0x10ffff -> 11110www[3] 10xxxxxx[6] 10yyyyyy[6] 10zzzzzz[6]
UTF8_BOUNDARIES = [0x7f, 0x7ff, 0xffff, 0x10ffff]
def to_utf8_enc( n )
r = 0
if n <= 0x7f
r = n
elsif n <= 0x7ff
y = 0xc0 | (n >> 6)
z = 0x80 | (n & 0x3f)
r = y << 8 | z
elsif n <= 0xffff
x = 0xe0 | (n >> 12)
y = 0x80 | (n >> 6) & 0x3f
z = 0x80 | n & 0x3f
r = x << 16 | y << 8 | z
elsif n <= 0x10ffff
w = 0xf0 | (n >> 18)
x = 0x80 | (n >> 12) & 0x3f
y = 0x80 | (n >> 6) & 0x3f
z = 0x80 | n & 0x3f
r = w << 24 | x << 16 | y << 8 | z
end
to_hex(r)
end
def from_utf8_enc( n )
n = n.hex
r = 0
if n <= 0x7f
r = n
elsif n <= 0xdfff
y = (n >> 8) & 0x1f
z = n & 0x3f
r = y << 6 | z
elsif n <= 0xefffff
x = (n >> 16) & 0x0f
y = (n >> 8) & 0x3f
z = n & 0x3f
r = x << 10 | y << 6 | z
elsif n <= 0xf7ffffff
w = (n >> 24) & 0x07
x = (n >> 16) & 0x3f
y = (n >> 8) & 0x3f
z = n & 0x3f
r = w << 18 | x << 12 | y << 6 | z
end
r
end
###
# Given a range, splits it up into ranges that can be continuously
# encoded into utf8. Eg: 0x00 .. 0xff => [0x00..0x7f, 0x80..0xff]
# This is not strictly needed since the current [5.1] unicode standard
# doesn't have ranges that straddle utf8 boundaries. This is included
# for completeness as there is no telling if that will ever change.
def utf8_ranges( range )
ranges = []
UTF8_BOUNDARIES.each do |max|
if range.begin <= max
if range.end <= max
ranges << range
return ranges
end
ranges << (range.begin .. max)
range = (max + 1) .. range.end
end
end
ranges
end
def build_range( start, stop )
size = start.size/2
left = size - 1
return [""] if size < 1
a = start[0..1]
b = stop[0..1]
###
# Shared prefix
if a == b
return build_range(start[2..-1], stop[2..-1]).map do |elt|
"0x#{a} " + elt
end
end
###
# Unshared prefix, end of run
return ["0x#{a}..0x#{b} "] if left.zero?
###
# Unshared prefix, not end of run
# Range can be 0x123456..0x56789A
# Which is equivalent to:
# 0x123456 .. 0x12FFFF
# 0x130000 .. 0x55FFFF
# 0x560000 .. 0x56789A
ret = []
ret << build_range(start, a + "FF" * left)
###
# Only generate middle range if need be.
if a.hex+1 != b.hex
max = to_hex(b.hex - 1)
max = "FF" if b == "FF"
ret << "0x#{to_hex(a.hex+1)}..0x#{max} " + "0x00..0xFF " * left
end
###
# Don't generate last range if it is covered by first range
ret << build_range(b + "00" * left, stop) unless b == "FF"
ret.flatten!
end
def to_utf8( range )
utf8_ranges( range ).map do |r|
begin_enc = to_utf8_enc(r.begin)
end_enc = to_utf8_enc(r.end)
build_range begin_enc, end_enc
end.flatten!
end
##
# Perform a 3-way comparison of the number of codepoints advertised by
# the unicode spec for the given range, the originally parsed range,
# and the resulting utf8 encoded range.
def count_codepoints( code )
code.split(' ').inject(1) do |acc, elt|
if elt =~ /0x(.+)\.\.0x(.+)/
if @encoding == :utf8
acc * (from_utf8_enc($2) - from_utf8_enc($1) + 1)
else
acc * ($2.hex - $1.hex + 1)
end
else
acc
end
end
end
def is_valid?( range, desc, codes )
spec_count = 1
spec_count = $1.to_i if desc =~ /\[(\d+)\]/
range_count = range.end - range.begin + 1
sum = codes.inject(0) { |acc, elt| acc + count_codepoints(elt) }
sum == spec_count and sum == range_count
end
##
# Generate the state maching to stdout
def generate_machine( name, property )
pipe = " "
@output.puts " #{name} = "
each_alpha( @chart_url, property ) do |range, desc|
codes = (@encoding == :ucs4) ? to_ucs4(range) : to_utf8(range)
#raise "Invalid encoding of range #{range}: #{codes.inspect}" unless
# is_valid? range, desc, codes
range_width = codes.map { |a| a.size }.max
range_width = RANGE_WIDTH if range_width < RANGE_WIDTH
desc_width = TOTAL_WIDTH - RANGE_WIDTH - 11
desc_width -= (range_width - RANGE_WIDTH) if range_width > RANGE_WIDTH
if desc.size > desc_width
desc = desc[0..desc_width - 4] + "..."
end
codes.each_with_index do |r, idx|
desc = "" unless idx.zero?
code = "%-#{range_width}s" % r
@output.puts " #{pipe} #{code} ##{desc}"
pipe = "|"
end
end
@output.puts " ;"
@output.puts ""
end
@output.puts <<EOF
# The following Ragel file was autogenerated with #{$0}
# from: #{@chart_url}
#
# It defines #{properties}.
#
# To use this, make sure that your alphtype is set to #{ALPHTYPES[@encoding]},
# and that your input is in #{@encoding}.
%%{
machine #{machine_name};
EOF
properties.each { |x| generate_machine( x, x ) }
@output.puts <<EOF
}%%
EOF

View File

@ -0,0 +1,19 @@
package textseg
import "unicode/utf8"
// ScanGraphemeClusters is a split function for bufio.Scanner that splits
// on UTF8 sequence boundaries.
//
// This is included largely for completeness, since this behavior is already
// built in to Go when ranging over a string.
func ScanUTF8Sequences(data []byte, atEOF bool) (int, []byte, error) {
if len(data) == 0 {
return 0, nil, nil
}
r, seqLen := utf8.DecodeRune(data)
if r == utf8.RuneError && !atEOF {
return 0, nil, nil
}
return seqLen, data[:seqLen], nil
}

View File

@ -6,7 +6,7 @@ import (
"math/big"
"strings"
"github.com/apparentlymart/go-textseg/textseg"
"github.com/apparentlymart/go-textseg/v12/textseg"
"github.com/zclconf/go-cty/cty"
"github.com/zclconf/go-cty/cty/convert"

View File

@ -6,7 +6,8 @@ import (
"sort"
"strings"
"github.com/apparentlymart/go-textseg/textseg"
"github.com/apparentlymart/go-textseg/v12/textseg"
"github.com/zclconf/go-cty/cty"
"github.com/zclconf/go-cty/cty/function"
"github.com/zclconf/go-cty/cty/gocty"
@ -143,8 +144,14 @@ var SubstrFunc = function.New(&function.Spec{
}
offset += totalLen
} else if length == 0 {
// Short circuit here, after error checks, because if a
// string of length 0 has been requested it will always
// be the empty string
return cty.StringVal(""), nil
}
sub := in
pos := 0
var i int

View File

@ -0,0 +1,80 @@
package stdlib
import (
"regexp"
"strings"
"github.com/zclconf/go-cty/cty"
"github.com/zclconf/go-cty/cty/function"
)
// ReplaceFunc is a function that searches a given string for another given
// substring, and replaces each occurence with a given replacement string.
// The substr argument is a simple string.
var ReplaceFunc = function.New(&function.Spec{
Params: []function.Parameter{
{
Name: "str",
Type: cty.String,
},
{
Name: "substr",
Type: cty.String,
},
{
Name: "replace",
Type: cty.String,
},
},
Type: function.StaticReturnType(cty.String),
Impl: func(args []cty.Value, retType cty.Type) (cty.Value, error) {
str := args[0].AsString()
substr := args[1].AsString()
replace := args[2].AsString()
return cty.StringVal(strings.Replace(str, substr, replace, -1)), nil
},
})
// RegexReplaceFunc is a function that searches a given string for another
// given substring, and replaces each occurence with a given replacement
// string. The substr argument must be a valid regular expression.
var RegexReplaceFunc = function.New(&function.Spec{
Params: []function.Parameter{
{
Name: "str",
Type: cty.String,
},
{
Name: "substr",
Type: cty.String,
},
{
Name: "replace",
Type: cty.String,
},
},
Type: function.StaticReturnType(cty.String),
Impl: func(args []cty.Value, retType cty.Type) (ret cty.Value, err error) {
str := args[0].AsString()
substr := args[1].AsString()
replace := args[2].AsString()
re, err := regexp.Compile(substr)
if err != nil {
return cty.UnknownVal(cty.String), err
}
return cty.StringVal(re.ReplaceAllString(str, replace)), nil
},
})
// Replace searches a given string for another given substring,
// and replaces all occurrences with a given replacement string.
func Replace(str, substr, replace cty.Value) (cty.Value, error) {
return ReplaceFunc.Call([]cty.Value{str, substr, replace})
}
func RegexReplace(str, substr, replace cty.Value) (cty.Value, error) {
return RegexReplaceFunc.Call([]cty.Value{str, substr, replace})
}

4
vendor/modules.txt vendored
View File

@ -97,6 +97,8 @@ github.com/antihax/optional
github.com/apparentlymart/go-cidr/cidr
# github.com/apparentlymart/go-textseg v1.0.0
github.com/apparentlymart/go-textseg/textseg
# github.com/apparentlymart/go-textseg/v12 v12.0.0
github.com/apparentlymart/go-textseg/v12/textseg
# github.com/approvals/go-approval-tests v0.0.0-20160714161514-ad96e53bea43
github.com/approvals/go-approval-tests
github.com/approvals/go-approval-tests/reporters
@ -638,7 +640,7 @@ github.com/yandex-cloud/go-sdk/pkg/retry
github.com/yandex-cloud/go-sdk/pkg/sdkerrors
github.com/yandex-cloud/go-sdk/pkg/singleflight
github.com/yandex-cloud/go-sdk/sdkresolvers
# github.com/zclconf/go-cty v1.3.1
# github.com/zclconf/go-cty v1.3.2-0.20200309235747-0b5d9cf50df7
github.com/zclconf/go-cty/cty
github.com/zclconf/go-cty/cty/convert
github.com/zclconf/go-cty/cty/function

View File

@ -0,0 +1,46 @@
---
layout: "docs"
page_title: "regexreplace - Functions - Configuration Language"
sidebar_current: "configuration-functions-string-regexreplace"
description: |-
The regexreplace function searches a given string for another given substring,
and replaces all occurrences with a given replacement string. The substring
argument can be a valid regular expression or a string.
---
# `regexreplace` Function
`regexreplace` searches a given string for another given substring, and
replaces each occurrence with a given replacement string. The substring
argument can be a valid regular expression or a string.
```hcl
regexreplace(string, substring, replacement)
```
`substring` should not be wrapped in forward slashes, it is always treated as a
regular expression. The `replacement` string can incorporate captured strings
from the input by using an `$n` or `${n}` sequence, where `n` is the index or
name of a capture group.
## Examples
```
> regexreplace("hello world", "world", "everybody")
hello everybody
> regexreplace("hello world", "w.*d", "everybody")
hello everybody
> regexreplace("-ab-axxb-", "a(x*)b", "$1W)
---
> regexreplace("-ab-axxb-", "a(x*)b", "${1}W")
-W-xxW-
```
## Related Functions
- [`replace`](./replace.html) searches a given string for another given
substring, and replaces all occurrences with a given replacement string.

View File

@ -17,24 +17,17 @@ each occurrence with a given replacement string.
replace(string, substring, replacement)
```
If `substring` is wrapped in forward slashes, it is treated as a regular
expression, using the same pattern syntax as
[`regex`](./regex.html). If using a regular expression for the substring
argument, the `replacement` string can incorporate captured strings from
the input by using an `$n` sequence, where `n` is the index or name of a
capture group.
## Examples
```
> replace("1 + 2 + 3", "+", "-")
1 - 2 - 3
> replace("hello world", "/w.*d/", "everybody")
> replace("hello world", "world", "everybody")
hello everybody
```
## Related Functions
- [`regex`](./regex.html) searches a given string for a substring matching a
given regular expression pattern.
- [`regexreplace`](./regexreplace.html) searches a given string for another given substring,
and replaces each occurrence with a given replacement string.

View File

@ -72,6 +72,14 @@
<a href="/docs/configuration/from-1.5/functions/string/lower.html">lower</a>
</li>
<li <%= sidebar_current("configuration-functions-string-replace") %>>
<a href="/docs/configuration/from-1.5/functions/string/replace.html">replace</a>
</li>
<li <%= sidebar_current("configuration-functions-string-regexreplace") %>>
<a href="/docs/configuration/from-1.5/functions/string/regexreplace.html">regexreplace</a>
</li>
<li <%= sidebar_current("configuration-functions-string-split") %>>
<a href="/docs/configuration/from-1.5/functions/string/split.html">split</a>
</li>