Extensible Markup Language - XML

XML parsing module offers Flows for parsing, processing and writing XML documents.

Reported issues

Tagged issues at Github

Artifacts

sbt
libraryDependencies += "com.lightbend.akka" %% "akka-stream-alpakka-xml" % "0.20"
Maven
<dependency>
  <groupId>com.lightbend.akka</groupId>
  <artifactId>akka-stream-alpakka-xml_2.12</artifactId>
  <version>0.20</version>
</dependency>
Gradle
dependencies {
  compile group: 'com.lightbend.akka', name: 'akka-stream-alpakka-xml_2.12', version: '0.20'
}

XML parsing

XML processing pipeline starts with an XmlParsing.parser flow which parses a stream of ByteStrings to XML parser events.

Scala
val parse = Flow[String]
  .map(ByteString(_))
  .via(XmlParsing.parser)
  .toMat(Sink.seq)(Keep.right)
Full source at GitHub
Java
final Sink<String, CompletionStage<List<ParseEvent>>> parse =
    Flow.<String>create()
        .map(ByteString::fromString)
        .via(XmlParsing.parser())
        .toMat(Sink.seq(), Keep.right());
Full source at GitHub

To parse an XML document run XML document source with this parser.

Scala
val doc = "<doc><elem>elem1</elem><elem>elem2</elem></doc>"
val resultFuture = Source.single(doc).runWith(parse)
Full source at GitHub
Java
final String doc = "<doc><elem>elem1</elem><elem>elem2</elem></doc>";
final CompletionStage<List<ParseEvent>> resultStage =
    Source.single(doc).runWith(parse, materializer);
Full source at GitHub

XML writing

XML processing pipeline ends with an XmlWriting.writer flow which writes a stream of XML parser events to ByteStrings.

Scala
val writer: Sink[ParseEvent, Future[String]] = Flow[ParseEvent]
  .via(XmlWriting.writer)
  .map[String](_.utf8String)
  .toMat(Sink.fold[String, String]("")((t, u) => t + u))(Keep.right)
Full source at GitHub
Java
final Sink<ParseEvent, CompletionStage<String>> write =
    Flow.of(ParseEvent.class)
        .via(XmlWriting.writer())
        .map(ByteString::utf8String)
        .toMat(Sink.fold("", (acc, el) -> acc + el), Keep.right());
final Sink<ParseEvent, CompletionStage<String>> write =
    Flow.of(ParseEvent.class)
        .via(XmlWriting.writer())
        .map(ByteString::utf8String)
        .toMat(Sink.fold("", (acc, el) -> acc + el), Keep.right());
Full source at GitHub

To write an XML document run XML document source with this writer.

Scala
val listEl = List(
  StartDocument,
  StartElement(
    "book",
    namespace = Some("urn:loc.gov:books"),
    prefix = Some("bk"),
    namespaceCtx = List(Namespace("urn:loc.gov:books", prefix = Some("bk")),
                        Namespace("urn:ISBN:0-395-36341-6", prefix = Some("isbn")))
  ),
  StartElement(
    "title",
    namespace = Some("urn:loc.gov:books"),
    prefix = Some("bk")
  ),
  Characters("Cheaper by the Dozen"),
  EndElement("title"),
  StartElement(
    "number",
    namespace = Some("urn:ISBN:0-395-36341-6"),
    prefix = Some("isbn")
  ),
  Characters("1568491379"),
  EndElement("number"),
  EndElement("book"),
  EndDocument
)

val doc =
  """<?xml version='1.0' encoding='UTF-8'?><bk:book xmlns:bk="urn:loc.gov:books" xmlns:isbn="urn:ISBN:0-395-36341-6"><bk:title>Cheaper by the Dozen</bk:title><isbn:number>1568491379</isbn:number></bk:book>"""
val resultFuture: Future[String] = Source.fromIterator[ParseEvent](() => listEl.iterator).runWith(writer)
resultFuture.futureValue(Timeout(3.seconds)) should ===(doc)
Full source at GitHub
Java
final String doc =
    "<?xml version='1.0' encoding='UTF-8'?>"
        + "<bk:book xmlns:bk=\"urn:loc.gov:books\" xmlns:isbn=\"urn:ISBN:0-395-36341-6\">"
        + "<bk:title>Cheaper by the Dozen</bk:title><isbn:number>1568491379</isbn:number></bk:book>";
final List<Namespace> nmList = new ArrayList<>();
nmList.add(Namespace.create("urn:loc.gov:books", Optional.of("bk")));
nmList.add(Namespace.create("urn:ISBN:0-395-36341-6", Optional.of("isbn")));
final List<ParseEvent> docList = new ArrayList<>();
docList.add(StartDocument.getInstance());
docList.add(
    StartElement.create(
        "book",
        Collections.emptyList(),
        Optional.of("bk"),
        Optional.of("urn:loc.gov:books"),
        nmList));
docList.add(
    StartElement.create(
        "title", Collections.emptyList(), Optional.of("bk"), Optional.of("urn:loc.gov:books")));
docList.add(Characters.create("Cheaper by the Dozen"));
docList.add(EndElement.create("title"));
docList.add(
    StartElement.create(
        "number",
        Collections.emptyList(),
        Optional.of("isbn"),
        Optional.of("urn:ISBN:0-395-36341-6")));
docList.add(Characters.create("1568491379"));
docList.add(EndElement.create("number"));
docList.add(EndElement.create("book"));
docList.add(EndDocument.getInstance());

final CompletionStage<String> resultStage = Source.from(docList).runWith(write, materializer);
Full source at GitHub

XML Subslice

Use XmlParsing.subslice to filter out all elements not corresponding to a certain path.

Scala
val parse = Flow[String]
  .map(ByteString(_))
  .via(XmlParsing.parser)
  .via(XmlParsing.subslice("doc" :: "elem" :: "item" :: Nil))
  .toMat(Sink.seq)(Keep.right)
Full source at GitHub
Java
final Sink<String, CompletionStage<List<ParseEvent>>> parse =
    Flow.<String>create()
        .map(ByteString::fromString)
        .via(XmlParsing.parser())
        .via(XmlParsing.subslice(Arrays.asList("doc", "elem", "item")))
        .toMat(Sink.seq(), Keep.right());
Full source at GitHub

To get a subslice of an XML document run XML document source with this parser.

Scala
val doc =
  """
    |<doc>
    |  <elem>
    |    <item>i1</item>
    |    <item><sub>i2</sub></item>
    |    <item>i3</item>
    |  </elem>
    |</doc>
  """.stripMargin
val resultFuture = Source.single(doc).runWith(parse)
Full source at GitHub
Java
final String doc =
    "<doc>"
        + "  <elem>"
        + "    <item>i1</item>"
        + "    <item><sub>i2</sub></item>"
        + "     <item>i3</item>"
        + "  </elem>"
        + "</doc>";
final CompletionStage<List<ParseEvent>> resultStage =
    Source.single(doc).runWith(parse, materializer);
Full source at GitHub
The source code for this page can be found here.