They have the right idea, it's a real need and a major opportunity. Airbyte[1] is a few steps ahead. I also expect that the Airflow community will also wade into these waters at some point.
I'm not aware of any. I did just open this issue[0] in the Meltano project to open discussion with the team/community. It could be an interesting iteration on the Singer Spec[1] if we find that users are interested in it and it helps solve some bottleneck challenges.
yeah we do not push an etl pipeline through json unless we have to (and generally cannot), most etl-scale data engineering we do is almost all arrow/parquet/orc/protobuf etc, and slow legacy, odbc/json, which is streams that we turn into typed and compact data. I think json fine for command/metadata layers though, esp early on, but pretty core to what I look for an etl/streaming tool is out-of-the-box foundations for the data plane
the good news is the implicitly typed json examples look arrow friendly, so users can to/from_json if they don't care about data speed/quality like when prototyping and not think about it. there may be other data-engineering-friendly formats that'd work too.
prefect, dask, and friends solve it by abstracting over it. you can send whatever you want.. and it happens to be friendly to dataframes (pydata) / compact & typed data. but there projects seem to be more about source/sink, so encouraging structure by default would be helpful...
AFAIK Meltano uses JSON only in the interface between a tap (source) and a target, to communicate schema, state and records.
It's up to the target what it does with the JSON messages it receives, so you can for example have a target-avro that takes JSON records and outputs them as an Avro file and translates the JSON schema to the corresponding Avro schema.
It's dependent on your use case, but the three examples you've listed here all have a slightly different approach to the market. Stitch (as mentioned in a comment below) is SaaS only. Pipelinewise is open source but as far as I know have no plans to build a company around it. With Meltano, we're aiming to grow the project and community and eventually build a business around it in a similar manner to what GitLab has done. Our docs[0] have more information about our current focus and roadmap if you're curious.
Stich is SaaS only. That by nature makes it suitable for some uses, and unsuitable for others (like when you have a provision that your data can't leave your network or when you are in a company where adding a new vendor isn't a quick process.
Meltano and Pipelinewise are open source projects that someone built for themselves but are sharing. You can just start playing with it and change the code or whatever, but there's no support to pay for.
For example where I'm standing the best one would be "Stitch I can self-host for free for a PoC and then eventually engage vendor about a support contract while still self-hosting it for security reasons" but there doesn't seem to be anything like it.
The GitLab Data Team is running Meltano in production[0]. We're currently extracting Zoom data with it and have plans for several more extractors (Slack, GMail, PTO by Roots, EdCast, and a few more). I just made this MR[1] to update the list of Extractors to include Zoom too.
And the GitLab Data Team is not alone! The Meltano Slack community (link on the homepage) is about 800 strong right now, and every day we've got people discussing their production deployments and helping new users set up their own.
PS. Like Taylor, I'm on the Meltano team at GitLab.
1. https://airbyte.io/