Blame view

node_modules/sax/README.md 8.18 KB
f7563de62   Palak Handa   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
  # sax js
  
  A sax-style parser for XML and HTML.
  
  Designed with [node](http://nodejs.org/) in mind, but should work fine in
  the browser or other CommonJS implementations.
  
  ## What This Is
  
  * A very simple tool to parse through an XML string.
  * A stepping stone to a streaming HTML parser.
  * A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
    docs.
  
  ## What This Is (probably) Not
  
  * An HTML Parser - That's a fine goal, but this isn't it.  It's just
    XML.
  * A DOM Builder - You can use it to build an object model out of XML,
    but it doesn't do that out of the box.
  * XSLT - No DOM = no querying.
  * 100% Compliant with (some other SAX implementation) - Most SAX
    implementations are in Java and do a lot more than this does.
  * An XML Validator - It does a little validation when in strict mode, but
    not much.
  * A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
    masochism.
  * A DTD-aware Thing - Fetching DTDs is a much bigger job.
  
  ## Regarding `<!DOCTYPE`s and `<!ENTITY`s
  
  The parser will handle the basic XML entities in text nodes and attribute
  values: `&amp; &lt; &gt; &apos; &quot;`. It's possible to define additional
  entities in XML by putting them in the DTD. This parser doesn't do anything
  with that. If you want to listen to the `ondoctype` event, and then fetch
  the doctypes, and read the entities and add them to `parser.ENTITIES`, then
  be my guest.
  
  Unknown entities will fail in strict mode, and in loose mode, will pass
  through unmolested.
  
  ## Usage
  
  ```javascript
  var sax = require("./lib/sax"),
    strict = true, // set to false for html-mode
    parser = sax.parser(strict);
  
  parser.onerror = function (e) {
    // an error happened.
  };
  parser.ontext = function (t) {
    // got some text.  t is the string of text.
  };
  parser.onopentag = function (node) {
    // opened a tag.  node has "name" and "attributes"
  };
  parser.onattribute = function (attr) {
    // an attribute.  attr has "name" and "value"
  };
  parser.onend = function () {
    // parser stream is done, and ready to have more stuff written to it.
  };
  
  parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
  
  // stream usage
  // takes the same options as the parser
  var saxStream = require("sax").createStream(strict, options)
  saxStream.on("error", function (e) {
    // unhandled errors will throw, since this is a proper node
    // event emitter.
    console.error("error!", e)
    // clear the error
    this._parser.error = null
    this._parser.resume()
  })
  saxStream.on("opentag", function (node) {
    // same object as above
  })
  // pipe is supported, and it's readable/writable
  // same chunks coming in also go out.
  fs.createReadStream("file.xml")
    .pipe(saxStream)
    .pipe(fs.createWriteStream("file-copy.xml"))
  ```
  
  
  ## Arguments
  
  Pass the following arguments to the parser function.  All are optional.
  
  `strict` - Boolean. Whether or not to be a jerk. Default: `false`.
  
  `opt` - Object bag of settings regarding string formatting.  All default to `false`.
  
  Settings supported:
  
  * `trim` - Boolean. Whether or not to trim text and comment nodes.
  * `normalize` - Boolean. If true, then turn any whitespace into a single
    space.
  * `lowercase` - Boolean. If true, then lowercase tag names and attribute names
    in loose mode, rather than uppercasing them.
  * `xmlns` - Boolean. If true, then namespaces are supported.
  * `position` - Boolean. If false, then don't track line/col/position.
  * `strictEntities` - Boolean. If true, only parse [predefined XML
    entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
    (`&amp;`, `&apos;`, `&gt;`, `&lt;`, and `&quot;`)
  
  ## Methods
  
  `write` - Write bytes onto the stream. You don't have to do this all at
  once. You can keep writing as much as you want.
  
  `close` - Close the stream. Once closed, no more data may be written until
  it is done processing the buffer, which is signaled by the `end` event.
  
  `resume` - To gracefully handle errors, assign a listener to the `error`
  event. Then, when the error is taken care of, you can call `resume` to
  continue parsing. Otherwise, the parser will not continue while in an error
  state.
  
  ## Members
  
  At all times, the parser object will have the following members:
  
  `line`, `column`, `position` - Indications of the position in the XML
  document where the parser currently is looking.
  
  `startTagPosition` - Indicates the position where the current tag starts.
  
  `closed` - Boolean indicating whether or not the parser can be written to.
  If it's `true`, then wait for the `ready` event to write again.
  
  `strict` - Boolean indicating whether or not the parser is a jerk.
  
  `opt` - Any options passed into the constructor.
  
  `tag` - The current tag being dealt with.
  
  And a bunch of other stuff that you probably shouldn't touch.
  
  ## Events
  
  All events emit with a single argument. To listen to an event, assign a
  function to `on<eventname>`. Functions get executed in the this-context of
  the parser object. The list of supported events are also in the exported
  `EVENTS` array.
  
  When using the stream interface, assign handlers using the EventEmitter
  `on` function in the normal fashion.
  
  `error` - Indication that something bad happened. The error will be hanging
  out on `parser.error`, and must be deleted before parsing can continue. By
  listening to this event, you can keep an eye on that kind of stuff. Note:
  this happens *much* more in strict mode. Argument: instance of `Error`.
  
  `text` - Text node. Argument: string of text.
  
  `doctype` - The `<!DOCTYPE` declaration. Argument: doctype string.
  
  `processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument:
  object with `name` and `body` members. Attributes are not parsed, as
  processing instructions have implementation dependent semantics.
  
  `sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`
  would trigger this kind of event. This is a weird thing to support, so it
  might go away at some point. SAX isn't intended to be used to parse SGML,
  after all.
  
  `opentagstart` - Emitted immediately when the tag name is available,
  but before any attributes are encountered.  Argument: object with a
  `name` field and an empty `attributes` set.  Note that this is the
  same object that will later be emitted in the `opentag` event.
  
  `opentag` - An opening tag. Argument: object with `name` and `attributes`.
  In non-strict mode, tag names are uppercased, unless the `lowercase`
  option is set.  If the `xmlns` option is set, then it will contain
  namespace binding information on the `ns` member, and will have a
  `local`, `prefix`, and `uri` member.
  
  `closetag` - A closing tag. In loose mode, tags are auto-closed if their
  parent closes. In strict mode, well-formedness is enforced. Note that
  self-closing tags will have `closeTag` emitted immediately after `openTag`.
  Argument: tag name.
  
  `attribute` - An attribute node.  Argument: object with `name` and `value`.
  In non-strict mode, attribute names are uppercased, unless the `lowercase`
  option is set.  If the `xmlns` option is set, it will also contains namespace
  information.
  
  `comment` - A comment node.  Argument: the string of the comment.
  
  `opencdata` - The opening tag of a `<![CDATA[` block.
  
  `cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can get
  quite large, this event may fire multiple times for a single block, if it
  is broken up into multiple `write()`s. Argument: the string of random
  character data.
  
  `closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block.
  
  `opennamespace` - If the `xmlns` option is set, then this event will
  signal the start of a new namespace binding.
  
  `closenamespace` - If the `xmlns` option is set, then this event will
  signal the end of a namespace binding.
  
  `end` - Indication that the closed stream has ended.
  
  `ready` - Indication that the stream has reset, and is ready to be written
  to.
  
  `noscript` - In non-strict mode, `<script>` tags trigger a `"script"`
  event, and their contents are not checked for special xml characters.
  If you pass `noscript: true`, then this behavior is suppressed.
  
  ## Reporting Problems
  
  It's best to write a failing test if you find an issue.  I will always
  accept pull requests with failing tests if they demonstrate intended
  behavior, but it is very hard to figure out what issue you're describing
  without a test.  Writing a test is also the best way for you yourself
  to figure out if you really understand the issue you think you have with
  sax-js.